Bogong Su | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bogong Su is active.

Explore More

Publication

Featured researches published by Bogong Su.

international symposium on microarchitecture | 1986

URPR—An extension of URCR for software pipelining

Bogong Su; Shiyuan Ding; Jinshi Xia

The software pipeline technique is an effective approach to optimizing loops in array processor programs, but existing methods are of high complexity and the results may not be satisfactory. This paper introduces the URPR algorithm, an extension of the microcode loop compaction algorithm URCR. Firstly, unroll the loop(the number of unrolled loop bodies relies on the inter-body data dependency); secondly, pipeline the unrolled loop bodies one by one; and finally, a new optimized loop body is obtained after rerolling. Preliminary tests indicate that URPR produces favorable results with lower complexity.

international symposium on microarchitecture | 1987

GURPR—a method for global software pipelining

Bogong Su; Shiyuan Ding; Jian Wang; Jinshi Xia

The software pipelining technique is an effective approach to the optimization of loops in array processor programs and microprograms. In this paper we present a global URPR algorithm—GURPR to optimize loops of different structures based on the LURPR method we presented in 1986. We start with a brief introduction to LURPR, then discuss the pipelining of loops with abnormal entries, conditional exits, more than one path, nested loops and subroutine calls respectively. Finally we present the complete GURPR algorithm.

international symposium on microarchitecture | 1993

GPMB—software pipelining branch-intensive loops

Zhizhong Tang; Gang Chen; Chihong Zhang; Yingwei Zhang; Bogong Su; Stanley Habib

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle. >

international symposium on microarchitecture | 1991

GURPR * : a new global software pipelining algorithm

Bogong Su; Jian Wang

Software pipclining as an effective loop optimization technique has been widefy used in various optimizing compilers. Afthough some software pipefining algorithms can optimize complicated loops globally, they are still not satisfied both in time efficiency and space efficiency simultaneously. In this paper, we present a new global software pipelining algorithm GURPR* which is applied in the URPR-1 optimtilng compiler. Preliminary experiments show that GURPR* has good time efficiency as well as good space efficiency which is quite important for a single-chip VLIW machine with limited capacity of onchip sxmtrol memory.

International Journal of Parallel Programming | 1994

Decomposed software pipelining: a new perspective and a new approach

Jian Wang; Christine Eisenbeis; Martin Jourdan; Bogong Su

Software pipelining is an efficient instruction-level loop scheduling technique, but existing software pipelining approaches have not been widely used in practical and commercial compilers. This is mainly because resource constraints and the cyclic data dependencies make software pipelining very complicated and difficult to apply. In this paper we present a new perspective on software pipelining in which it is decomposed into two subproblems—one is free from cyclic data dependencies and can be effectively solved by the list scheduling technique, and the other is free from resource constraints and can be easily solved by classical polynomial-time algorithms of graph theory. Based on this new perspective, we develop a new instruction-level loop scheduling approach, call DEcomposed Software Pipelining (DESP).

international symposium on microarchitecture | 1984

An improvement of trace scheduling for global microcode compaction

Bogong Su; Shiyuan Ding; Lan Jin

Fishers trace scheduling procedure for global compaction has proven to be able to produce significant reduction in execution time of compacted microcode, however extra space may be sometimes required during bookkeeping, and the efficacy of compaction of microprogram loop is lower than that of hand compaction. This paper introduces an improved trace scheduling compaction algorithm to mitigate the drawbacks mentioned above. The improved algorithm is based on a modified menu of moving microoperations, an improved trace scheduling algorithm, and a special loop compaction algorithm. Preliminary tests indicate that this global compaction algorithm gives shorter execution time and less space requirement in comparison with Fishers algorithm.

international symposium on microarchitecture | 1987

Microcode compaction with timing constraints

Bogong Su; Shiyuan Ding; Jian Wang; Jinshi Xia

At present, microcode compaction with timing constraints (abbreviated as MCTC) is still an open problem. Complex timing relation between microoperations greatly affects the optimization result of microcode. This paper begins with a survey of MCTC problems, then presents a formal description of MCTC and, on the basis of a systematic study of the characteristics of MCTC, presents a generally-oriented heuristic algorithm— CAS, which has a high success rate of scheduling and promises good optimization result. Preliminary experiments indicate that CAS is better than other existing MCTC algorithms.

international symposium on microarchitecture | 1985

Some experiments in global microcode compaction

Bogong Su; Shiyuan Ding

Global microcode compaction is an open problem in firmware engineering. Although Fishers trace scheduling method may produce significant reductions in the execution time of compacted microcode, it has some drawbacks. There have been four methods. Tree, SRDAG, ITSC , and GDDG, presented recently to mitigate those drawbacks in different ways. The purpose of the research reported in this paper is to evaluate these new methods. In order to do this, we have tested the published algorithms on several unified microcode sequences of two real machines and compared them on the basis of the results of experiments using three criteria: time efficiency, space efficiency, and complexity.

international symposium on microarchitecture | 1990

A software pipelining based VLIW architecture and optimizing compiler

Bogong Su; Jian Wang; Zhizhong Tang; Wei Zhao; Yimin Wu

This paper introduces a VLIW architecture and its optimizing compiler which are now under development. Based on the URPR software pipelining approach, the architecture integrates nine PEs with the same structure on a single-chip. In addition, a pipeline register file is used to reduce the inter-body dependent distance to enhance the overlapping of the adjacent loop iterations, furthermore to shorten the length of the optimized loop body. The pipeline register file also increases the bandwidth between PEs. The optimizing compiler is also based on the URPR software pipelining approach. It uses a two-level software pipelining method to implement phase-coupled resource allocation and code optimization, and obtains good time and space optimal results. A compilation example of an FFT innermost loop is discussed. The simulation results indicate that the architecture could reach high performance with the aid of the optimizing compiler.<<ETX>>

international conference on acoustics speech and signal processing | 1998

Software pipelining of nested loops for real-time DSP applications

Jian Wang; Bogong Su

Modern DSP processors have been integrated with instruction-level parallelism (lLP), which presents a challenge to exploit ILP within DSP applications. Software pipelining is an efficient technique used to expose ILP for loop programs and has been widely used for current microprocessors. It has been also used in DSP compilers, but only for the innermost loops. This paper proposes a new approach which extends software pipelining from innermost loops to whole nested loops in DSP applications. Given a perfect loop, we apply an existing software pipelining approach for the innermost loops, then use the so-called pipelining-dovetailing transformation to extend software pipelining to the outer loops. We also present a transformation to convert a non-perfect nested loop into a perfect one. We have verified the above transformations with some nested loops selected from DSP compiler-challenge C code. The preliminary results are further presented in this paper.

Explore More