Yi-Ping You | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi-Ping You is active.

Explore More

Publication

Featured researches published by Yi-Ping You.

ACM Transactions on Design Automation of Electronic Systems | 2006

Compilers for leakage power reduction

Yi-Ping You; Chingren Lee; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts indicate that architectures, compilers, and software can be optimized so as to reduce the switching power (also known as dynamic power) in microprocessors. This has lead to interest in using architecture and compiler optimization to reduce leakage power (also known as static power) in microprocessors. In this article, we investigate compiler-analysis techniques that are related to reducing leakage power. The architecture model in our design is a system with an instruction set to support the control of power gating at the component level. Our compiler provides an analysis framework for utilizing instructions to reduce the leakage power. We present a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures. We also provide equations that can be used by the compiler to determine whether employing power-gating instructions in given program blocks will reduce the total energy requirements. As the duration of power gating on components when executing given program routines is related to the number and complexity of program branches, we propose a set of scheduling policies and evaluate their effectiveness. We performed experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing leakage power in microprocessors.

languages and compilers for parallel computing | 2002

Compiler analysis and supports for leakage power reduction on microprocessors

Yi-Ping You; Chingren Lee; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts also indicate architecture, compiler, and software participations can help reduce the switching activities (also known as dynamic power) on microprocessors. This raises interests on the issues to employ architecture and compiler efforts to reduce leakage power (also known as static power) on microprocessors. In this paper, we investigate the compiler analysis techniques related to reducing leakage power. The architecture model in our design is a system with an instruction set to support the control of power gating in the component levels. Our compiler gives an analysis framework to utilize the instruction to reduce the leakage power. We present a data flow analysis framework to estimate the component activities at fixed points of programs with the consideration of pipelines of architectures. We also give the equation for the compiler to decide if the employment of the power gating instructions on given program blocks will benefit the total energy reductions. As the duration of power gating on components on given program routines is related to program branches, we propose a set of scheduling policy include Basic_Blk_Sched, MIN_Path_Sched, and AVG_Path_Sched mechanisms and evaluate the effectiveness of those schemes. Our experiment is done by incorporating our compiler analysis and scheduling policy into SUIF compiler tools [32] and by simulating the energy consumptions on Wattch toolkits [6]. Experimental results show our mechanisms are effective in reducing leakage powers on microprocessors.

Concurrency and Computation: Practice and Experience | 2007

PALF: compiler supports for irregular register files in clustered VLIW DSP processors

Yung-Chia Lin; Yi-Ping You; Jenq Kuen Lee

A wide variety of register file architectures—developed for embedded processors—have recently been used with the aim of reducing power dissipation and die size, in contrast with the traditional unified register file structures. This article presents a novel register allocation scheme for a clustered VLIW DSP, which is designed with distinctively banked register files in which port access is highly restricted. Whilst the organization of the register files is designed to decrease power consumption by using fewer port connections, the cluster‐based design makes register access across clusters an additional issue, and the switched‐access nature of the register file demands further investigation into the use of optimizing register assignment as a means of increasing instruction‐level parallelism. We propose a heuristic algorithm, named ping‐pong aware local favorable (PALF) register allocation, to obtain a register allocation that is expected to better utilize irregular register file architectures. The results of experiments performed using a compiler based on the Open Research Compiler (ORC) showed significant performance improvement over the original ORCs approach, which is considered to be an optimized approach for common register file architectures. Copyright

languages and compilers for parallel computing | 2005

Compiler supports and optimizations for PAC VLIW DSP processors

Yung-Chia Lin; Chung-Lin Tang; Chung-Ju Wu; Ming-Yu Hung; Yi-Ping You; Ya-Chiao Moo; Sheng-Yuan Chen; Jenq Kuen Lee

PAC DSP is a novel VLIW DSP processor exceedingly utilized with port-restricted, distinct partitioned register file structures in addition to the heterogeneous clustered datapath architecture to attain low power consumption and reduced die size; however, these architectural features lend new challenges to the compiler construction. This paper describes our employment of the Open Research Compiler (ORC) infrastructure on PAC DSP architectures and the specific compilation design. Preliminary results indicated that our compiler development for PAC DSP is effective for the architecture and the evaluation is useful for the refinement of the architecture. Our experiences in designing the compiler support for heterogeneous VLIW DSP processors with irregular resource constraints may benefit the similar architectures.

acm sigplan symposium on principles and practice of parallel programming | 2015

VirtCL: a framework for OpenCL device abstraction and management

Yi-Ping You; Hen-Jung Wu; Yeh-Ning Tsai; Yen-Ting Chao

The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. However, the existing heterogeneous programming models (e.g., OpenCL) abstract details of GPU devices at the per-device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, multiple applications running on a multi-GPU system may compete for some of the GPU devices while leaving other GPU devices unused. Moreover, the distributed memory model defined in OpenCL, where each device has its own memory space, increases the complexity of managing the memory among multiple GPU devices. In this article we propose a framework (called VirtCL) that reduces the programming burden by acting as a layer between the programmer and the native OpenCL run-time system for abstracting multiple devices into a single virtual device and for scheduling computations and communications among the multiple devices. VirtCL comprises two main components: (1) a front-end library, which exposes primary OpenCL APIs and the virtual device, and (2) a back-end run-time system (called CLDaemon) for scheduling and dispatching kernel tasks based on a history-based scheduler. The front-end library forwards computation requests to the back-end CLDaemon, which then schedules and dispatches the requests. We also propose a history-based scheduler that is able to schedule kernel tasks in a contention- and communication-aware manner. Experiments demonstrated that the VirtCL framework introduced a small overhead (mean of 6%) but outperformed the native OpenCL run-time system for most benchmarks in the Rodinia benchmark suite, which was due to the abstraction layer eliminating the time-consuming initialization of OpenCL contexts. We also evaluated different scheduling policies in VirtCL with a real-world application (clsurf) and various synthetic workload traces. The results indicated that the VirtCL framework provides scalability for multiple kernel tasks running on multi-GPU systems.

ACM Transactions on Design Automation of Electronic Systems | 2007

Compilation for compact power-gating controls

Yi-Ping You; Chung-Wen Huang; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies due to the continuing size reductions and increasing speeds of transistors. Recent studies have attempted to reduce leakage power using integrated architecture and compiler power-gating mechanisms. This approach involves compilers inserting instructions into programs to shut down and wake up components, as appropriate. While early studies showed this approach to be effective, there are concerns about the large amount of power-control instructions being added to programs due to the increasing amount of components equipped with power-gating controls in SoC design platforms. In this article we present a sink-n-hoist framework for a compiler to generate balanced scheduling of power-gating instructions. Our solution attempts to merge several power-gating instructions into a single compound instruction, thereby reducing the amount of power-gating instructions issued. We performed experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumption using Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing the amount of power-gating instructions while further reducing leakage power compared to previous methods.

languages, compilers, and tools for embedded systems | 2007

Enabling compiler flow for embedded VLIW DSP processors with distributed register files

Chung-Kai Chen; Ling-Hua Tseng; Shih-Chang Chen; Young-Jia Lin; Yi-Ping You; Chia-Han Lu; Jenq Kuen Lee

High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising compiler optimization schemes for such architectures. In this paper, we address the compiler optimization issues for PAC architecture, which is a 5-way issue DSP processor with distributed register files. We present an integrated flow to address several phases of compiler optimizations in interacting with distributed register files and multi-bank register files in the layer of instruction scheduling, software pipelining, and data flow optimizations. Our experiments on a novel 32-bit embedded VLIW DSP (known as the PAC DSP core) exhibit the state of the art performance for embedded VLIW DSP processors with distributed register files by incorporating our proposed schemes in compilers.

embedded and real-time computing systems and applications | 2006

Integrating Compiler and System Toolkit Flow for Embedded VLIW DSP Processors

Chi Wu; Kun-Yuan Hsieh; Yung-Chia Lin; Chung-Ju Wu; Wen-Li Shih; Shih-Chang Chen; Chung-Kai Chen; Chien-Ching Huang; Yi-Ping You; Jenq Kuen Lee

To support high-performance and low-power for multimedia applications and for hand-held devices, embedded VLIW DSP processors are of research focus. With the tight resource constraints, distributed register files, variable-length encodings for instructions, and special data paths are frequently adopted. This creates challenges to deploy software toolkits for new embedded DSP processors. This article presents our methods and experiences to develop software and toolkit flows for PAC (parallel architecture core) VLIW DSP processors. Our toolkits include compilers, assemblers, debugger and DSP micro-kernels. We first retarget open research compiler (ORC) and toolkit chains for PAC VLIW DSP processor and address the issues to support distributed register files and ping-pong data paths for embedded VLIW DSP processors. Second, the linker and assembler are able to support variable length encoding schemes for DSP instructions. In addition, the debugger and DSP micro-kernel were designed to handle dual-core environments. The footprint of micro-kernel is also around 10K to address the code-size issues for embedded devices. We also present the experimental result in the compiler framework by incorporating software pipeline (SWP) policies for distributed register files in PAC architecture. Results indicated that our compiler framework gains performance improvement around 2.5 times against the code generated without our proposed optimizations

embedded software | 2005

A sink-n-hoist framework for leakage power reduction

Yi-Ping You; Chung-Wen Huang; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts have tried to integrate architecture and compiler solutions to employ power-gating mechanisms to reduce leakage power. This approach is to have compilers perform data-flow analysis and insert instructions at programs to shut down and wake up components whenever appropriate for power reductions. While this approach has been shown to be effective in early studies, there are concerns for the amount of power-control instructions being added to programs with the increasing amount of components equipped with power-gating control in a SoC design platform. In this paper, we present a Sink-N-Hoist framework in the compiler solution to generate balanced scheduling of power-gating instructions. Our solution will attempt to merge power-gating instructions as one compound instruction. Therefore, it will reduce the amount of power-gating instructions issued.We perform experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing the amount of power-gating instructions while further in reducing leakage power compared to previous methods.

ACM Transactions on Design Automation of Electronic Systems | 2014

Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs

Wen-Li Shih; Yi-Ping You; Chung-Wen Huang; Jenq Kuen Lee

Multithread programming is widely adopted in novel embedded system applications due to its high performance and flexibility. This article addresses compiler optimization for reducing the power consumption of multithread programs. A traditional compiler employs energy management techniques that analyze component usage in control-flow graphs with a focus on single-thread programs. In this environment the leakage power can be controlled by inserting on and off instructions based on component usage information generated by flow equations. However, these methods cannot be directly extended to a multithread environment due to concurrent execution issues. This article presents a multithread power-gating framework composed of multithread power-gating analysis (MTPGA) and predicated power-gating (PPG) energy management mechanisms for reducing the leakage power when executing multithread programs on simultaneous multithreading (SMT) machines. Our multithread programming model is based on hierarchical bulk-synchronous parallel (BSP) models. Based on a multithread component analysis with dataflow equations, our MTPGA framework estimates the energy usage of multithread programs and inserts PPG operations as power controls for energy management. We performed experiments by incorporating our power optimization framework into SUIF compiler tools and by simulating the energy consumption with a post-estimated SMT simulator based on Wattch toolkits. The experimental results show that the total energy consumption of a system with PPG support and our power optimization method is reduced by an average of 10.09% for BSP programs relative to a system without a power-gating mechanism on leakage contribution set to 30%; and the total energy consumption is reduced by an average of 4.27% on leakage contribution set to 10%. The results demonstrate our mechanisms are effective in reducing the leakage energy of BSP multithread programs.

Explore More