Jenq Kuen Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jenq Kuen Lee is active.

Explore More

Publication

Featured researches published by Jenq Kuen Lee.

acm sigplan symposium on principles and practice of parallel programming | 2003

Compiler support for speculative multithreading architecture with probabilistic points-to analysis

Peng-Sheng Chen; Ming-Yu Hung; Yuan-Shin Hwang; Roy Dz-Ching Ju; Jenq Kuen Lee

Speculative multithreading (SpMT) architecture can exploit thread-level parallelism that cannot be identified statically. Speedup can be obtained by speculatively executing threads in parallel that are extracted from a sequential program. However, performance degradation might happen if the threads are highly dependent, since a recovery mechanism will be activated when a speculative thread executes incorrectly and such a recovery action usually incurs a very high penalty. Therefore, it is essential for SpMT to quantify the degree of dependences and to turn off speculation if the degree of dependences passes certain thresholds. This paper presents a technique that quantitatively computes dependences between loop iterations and such information can be used to determine if loop iterations can be executed in parallel by speculative threads. This technique can be broken into two steps. First probabilistic points-to analysis is performed to estimate the probabilities of points-to relationships in case there are pointer references in programs, and then the degree of dependences between loop iterations is computed quantitatively. Preliminary experimental results show compiler-directed thread-level speculation based on the information gathered by this technique can achieve significant performance improvement on SpMT.

ACM Transactions on Design Automation of Electronic Systems | 2006

Compilers for leakage power reduction

Yi-Ping You; Chingren Lee; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts indicate that architectures, compilers, and software can be optimized so as to reduce the switching power (also known as dynamic power) in microprocessors. This has lead to interest in using architecture and compiler optimization to reduce leakage power (also known as static power) in microprocessors. In this article, we investigate compiler-analysis techniques that are related to reducing leakage power. The architecture model in our design is a system with an instruction set to support the control of power gating at the component level. Our compiler provides an analysis framework for utilizing instructions to reduce the leakage power. We present a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures. We also provide equations that can be used by the compiler to determine whether employing power-gating instructions in given program blocks will reduce the total energy requirements. As the duration of power gating on components when executing given program routines is related to the number and complexity of program branches, we propose a set of scheduling policies and evaluate their effectiveness. We performed experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing leakage power in microprocessors.

acm sigplan symposium on principles and practice of parallel programming | 1995

An array operation synthesis scheme to optimize Fortran 90 programs

Gwan Hwan Hwang; Jenq Kuen Lee; Dz Ching Ju

An increasing number of programming languages, such as Fortran 90 and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. In this paper, we present a new approach to combine consecutive data access patterns of array constructs into a composite access function to the source arrays. Our scheme is based on the composition of access functions, which is similar to a composition of mathematic functions. Our new scheme can handle not only data movements of arrays of different numbers of dimensions and segmented array operations but also masked array expressions and multiple sources array operations. As a result, our proposed scheme is the first synthesis scheme which can synthesize Fortran 90 RESHAPE, EOSHIFT, MERGE, and WHERE constructs together. Experimental results show speedups from 1.21 to 2.95 for code fragments from real applications on a Sequent multiprocessor machine by incorporating the proposed optimizations.

The Journal of Supercomputing | 2001

Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

Rong-Guey Chang; Tyng-Ruey Chuang; Jenq Kuen Lee

Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multi-dimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. In this paper, we address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higher-dimensional sparse arrays. This way, programmers need not worry about low-level system details when developing sparse applications. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Our sparse libraries are built for array intrinsics of Fortran 90, and they include an extensive set of array operations such as CSHIFT, EOSHIFT, MATMUL, MERGE, PACK, SUM, RESHAPE, SPREAD, TRANSPOSE, UNPACK, and section moves. Our work, to our best knowledge, is the first work to give sparse and parallel sparse supports for array intrinsics of Fortran 90. In addition, we provide a complete complexity analysis for our sparse implementation. The complexity of our algorithms is in proportion to the number of nonzero elements in the arrays, and that is consistent with the conventional design criteria for sparse algorithms and data structures. Our current testbed is an IBM SP2 workstation cluster. Preliminary experimental results with numerical routines, numerical applications, and data-intensive applications related to OLAP (on-line analytical processing) show that our approach is promising in speeding up sparse matrix computations on both sequential and distributed memory environments if the programs are expressed with Fortran 90 array expressions.

IEEE Transactions on Parallel and Distributed Systems | 2004

Interprocedural probabilistic pointer analysis

Peng-Sheng Chen; Yuan-Shin Hwang; Roy Dz-Ching Ju; Jenq Kuen Lee

When performing aggressive optimizations and parallelization to exploit features of advanced architectures, optimizing and parallelizing compilers need to quantitatively assess the profitability of any transformations in order to achieve high performance. Useful optimizations and parallelization can be performed if it is known that certain points-to relationships would hold with high or low probabilities. For instance, if the probabilities are low, a compiler could transform programs to perform data speculation or partition iterations into threads in speculative multithreading, or it would avoid conducting code specialization. Consequently, it is essential for compilers to incorporate pointer analysis techniques that can estimate the possibility for every points-to relationship that it would hold during the execution. However, conventional pointer analysis techniques do not provide such quantitative descriptions and, thus, hinder compilers from more aggressive optimizations, such as thread partitioning in speculative multithreading, data speculations, code specialization, etc. We address this issue by proposing a probabilistic points-to analysis technique to compute the probability of every points-to relationship at each program point. A context-sensitive interprocedural algorithm has been implemented based on the iterative data flow analysis framework, and has been incorporated into SUIF and MachSUIF. Experimental results show this technique can estimate the probabilities of points-to relationships in benchmark programs with reasonable small errors, about 4.6 percent on average. Furthermore, the current implementation cannot disambiguate heap and array elements. The errors are further significantly reduced when the future implementation incorporates techniques to disambiguate heap and array elements.

languages and compilers for parallel computing | 2002

Compiler analysis and supports for leakage power reduction on microprocessors

Yi-Ping You; Chingren Lee; Jenq Kuen Lee

Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts also indicate architecture, compiler, and software participations can help reduce the switching activities (also known as dynamic power) on microprocessors. This raises interests on the issues to employ architecture and compiler efforts to reduce leakage power (also known as static power) on microprocessors. In this paper, we investigate the compiler analysis techniques related to reducing leakage power. The architecture model in our design is a system with an instruction set to support the control of power gating in the component levels. Our compiler gives an analysis framework to utilize the instruction to reduce the leakage power. We present a data flow analysis framework to estimate the component activities at fixed points of programs with the consideration of pipelines of architectures. We also give the equation for the compiler to decide if the employment of the power gating instructions on given program blocks will benefit the total energy reductions. As the duration of power gating on components on given program routines is related to program branches, we propose a set of scheduling policy include Basic_Blk_Sched, MIN_Path_Sched, and AVG_Path_Sched mechanisms and evaluate the effectiveness of those schemes. Our experiment is done by incorporating our compiler analysis and scheduling policy into SUIF compiler tools [32] and by simulating the energy consumptions on Wattch toolkits [6]. Experimental results show our mechanisms are effective in reducing leakage powers on microprocessors.

Concurrency and Computation: Practice and Experience | 2007

PALF: compiler supports for irregular register files in clustered VLIW DSP processors

Yung-Chia Lin; Yi-Ping You; Jenq Kuen Lee

A wide variety of register file architectures—developed for embedded processors—have recently been used with the aim of reducing power dissipation and die size, in contrast with the traditional unified register file structures. This article presents a novel register allocation scheme for a clustered VLIW DSP, which is designed with distinctively banked register files in which port access is highly restricted. Whilst the organization of the register files is designed to decrease power consumption by using fewer port connections, the cluster‐based design makes register access across clusters an additional issue, and the switched‐access nature of the register file demands further investigation into the use of optimizing register assignment as a means of increasing instruction‐level parallelism. We propose a heuristic algorithm, named ping‐pong aware local favorable (PALF) register allocation, to obtain a register allocation that is expected to better utilize irregular register file architectures. The results of experiments performed using a compiler based on the Open Research Compiler (ORC) showed significant performance improvement over the original ORCs approach, which is considered to be an optimized approach for common register file architectures. Copyright

languages and compilers for parallel computing | 2005

Compiler supports and optimizations for PAC VLIW DSP processors

Yung-Chia Lin; Chung-Lin Tang; Chung-Ju Wu; Ming-Yu Hung; Yi-Ping You; Ya-Chiao Moo; Sheng-Yuan Chen; Jenq Kuen Lee

PAC DSP is a novel VLIW DSP processor exceedingly utilized with port-restricted, distinct partitioned register file structures in addition to the heterogeneous clustered datapath architecture to attain low power consumption and reduced die size; however, these architectural features lend new challenges to the compiler construction. This paper describes our employment of the Open Research Compiler (ORC) infrastructure on PAC DSP architectures and the specific compilation design. Preliminary results indicated that our compiler development for PAC DSP is effective for the architecture and the evaluation is useful for the refinement of the architecture. Our experiences in designing the compiler support for heterogeneous VLIW DSP processors with irregular resource constraints may benefit the similar architectures.

languages and compilers for parallel computing | 2001

Probabilistic points-to analysis

Yuan-Shin Hwang; Peng-Sheng Chen; Jenq Kuen Lee; Roy Dz-Ching Ju

Information gathered by the existing pointer analysis techniques can be classified as must aliases or definitely-points-to relationships, which hold for all executions, and may aliases or possibly-points-to relationships, which might hold for some executions. Such information does not provide quantitative descriptions to tell how likely the conditions will hold for the executions, which are needed for modern compiler optimizations, and thus has hindered compilers from more aggressive optimizations. This paper addresses this issue by proposing a probabilistic points-to analysis technique to compute the probability of each points-to relationship. Initial experiments are done by incorporating the probabilistic data flow analysis algorithm into SUIF and MachSUIF, and preliminary experimental results show the probability distributions of points-to relationships in several benchmark programs. This work presents a major enhancement for pointer analysis to keep up with modern compiler optimizations.

Journal of Parallel and Distributed Computing | 1998

A Function-Composition Approach to Synthesize Fortran 90 Array Operations

Gwan Hwan Hwang; Jenq Kuen Lee; Roy Dz Ching Ju

An increasing number of programming languages, such as Fortran 90 and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. In this paper, we present a new approach to combine consecutive array operations or array expressions into a composite access function of the source arrays. Our scheme is based on the composition of access functions, which is analogous to a composition of mathematic functions. Our new scheme can handle not only data movements of arrays with different numbers of dimensions and with multiple-clause array operations but also masked array expressions and multiple-source array operations. As a result, our proposed scheme is the first synthesis scheme which can collectively synthesize Fortran 90 RESHAPE, EOSHIFT, MERGE, array reduction operations, and WHERE constructs. In addition, we also discuss the case that the synthesis scheme may result in a performance anomaly in the presence of common subexpressions and one-to-many array operations. A solution is proposed to avoid such a performance anomaly. Experimental results show speedups from 1.21 to 2.95 over the base code for code fragments from real applications on a Sequent multiprocessor machine and also show comparable performance improvements on an 8-node SGI Power Challenge by incorporating our proposed optimizations

Explore More