Is this you? Create Your Porfile

Xiaobo Yan

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaobo Yan is active.

Explore More

Publication

Featured researches published by Xiaobo Yan.

international symposium on computer architecture | 2007

A 64-bit stream processor architecture for scientific applications

Xuejun Yang; Xiaobo Yan; Zuocheng Xing; Yu Deng; Jiang Jiang; Ying Zhang

Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. This paper first gives the design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing. The carrying out of 64-bit extension design and scientific computing oriented optimization are described in such aspects as instruction set architecture, stream controller, micro controller, ALU cluster, memory hierarchy and interconnection interface here. Second, two kinds of communications as message passing and stream communications are put forward. An interconnection based on the communications is designed for FT64-based high performance computers. Third, a novel stream programming language, SF95 (Stream FORTRAN95), and its compiler, SF95Compiler (Stream FORTRAN95 Compiler), are developed to facilitate the development of scientific applications. Finally, nine typical scientific application kernels are tested and the results show the efficiency of stream architecture for scientific computing.

languages, compilers, and tools for embedded systems | 2008

Optimizing scientific application loops on stream processors

Li Wang; Xuejun Yang; Jingling Xue; Yu Deng; Xiaobo Yan; Tao Tang; Quan Hoang Nguyen

This paper describes a graph coloring compiler framework to allocate on-chip SRF(Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism, i.e., overlapping kernel execution and memory transfers.Then the three SRF management tasks are solved in a unified manner via graph coloring: (1) placing streams in the SRF, (2) exploiting stream use, and (3) maximizing parallelism. We evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our FT64 stream processor. Our preliminary results show that compiler management achieves an average speedup of 2.3x compared to First-Fit allocation. In comparison with the performance results obtained from running these benchmarks on Itanium 2, an average speedup of 2.1x is observed.

international symposium on parallel and distributed processing and applications | 2006

Matrix-Based programming optimization for improving memory hierarchy performance on imagine

Xuejun Yang; Jing Du; Xiaobo Yan; Yu Deng

Despite Imagine presents an efficient memory hierarchy, the straightforward programming of scientific applications does not match the available memory hierarchy and thereby constrains the performance of stream applications. In this paper, we explore a novel matrix-based programming optimization for improving the memory hierarchy performance to sustain the operands needed for highly parallel computation. Our specific contributions include that we formulate the problem on the Data&Computation Matrix (D&C Matrix) that is proposed to abstract the relationship between streams and kernels, and present the key techniques for improving the multilevel bandwidth utilization based on this matrix. The experimental evaluation on five representative scientific applications shows that the new stream programs yielded by our optimization can effectively enhance the locality in LRF and SRF, improve the capacity utilization of LRF and SRF, make the best use of SPs and SBs, and avoid index stream overhead.

IEEE Transactions on Parallel and Distributed Systems | 2009

Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming

Xuejun Yang; Xiaobo Yan; Zuocheng Xing; Yu Deng; Jiang Jiang; Jing Du; Ying Zhang

The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and implement a 64-bit stream processor, Fei Teng 64 (FT64), which has a peak performance of 16 Gflops. FT64 supports two kinds of communications, message passing and stream communications, based on which, an interconnection architecture is designed for a FT64-based high-performance computer. This high-performance computer contains multiple modules, with each module containing eight FT64s. We also design a novel stream programming language, stream Fortran 95 (SF95), together with the compiler SF95 compiler, so as to facilitate the development of scientific applications. We test nine typical scientific application kernels on our FT64 platform to evaluate this design. The results demonstrate the effectiveness and efficiency of FT64 and its compiler for scientific computing.

The Journal of Supercomputing | 2009

Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor

Xuejun Yang; Jing Du; Xiaobo Yan; Yu Deng

FT64 is the first 64-bit stream processor designed for scientific computing. It is critical to exploit optimizing streamization approaches for scientific applications on FT64 due to the inefficiency of direct streamization approach. In this paper, we propose a novel matrix-based streamization approach for improving locality and parallelism of scientific applications on FT64. First, a Data&Computation Matrix is built to abstract the relationship between loops and arrays of the original programs, and it is helpful for formulating the streamization problem. Second, three key techniques for optimizing streamization approach are proposed based on the transformations of the matrix, i.e., coarse-grained program transformations, fine-grained program transformations, and stream organization optimizations. Finally, we apply our approach to ten typical scientific application kernels on FT64. The experimental results show that the matrix-based streamization approach achieves an average speedup of 2.76 over the direct streamization approach, and performs equally to or better than the corresponding Fortran programs on Itanium 2 except CG. It is certain that the matrix-based streamization approach is a promising and practical solution to efficiently exploit the tremendous potential of FT64.

Journal of Computer Science and Technology | 2009

SRF coloring: stream register file allocation via graph coloring

Xuejun Yang; Yu Deng; Li Wang; Xiaobo Yan; Jing Du; Ying Zhang; Guibin Wang; Tao Tang

Stream Register File (SRF) is a large on-chip memory of the stream processor and its efficient management is essential for good performance. Current stream programming languages expose the management of SRF to the programmer, incurring heavy burden on the programmer and bringing difficulties to inheriting the legacy codes. SF95 is the language developed for FT64 which is the first 64-bit stream processor designed for scientific applications. SF95 conceals SRF from the programmer and leaves the management of SRF to its compiler. In this paper, we present a compiler approach named SRF Coloring to manage SRF automatically. The novelties of this paper are: first, it is the first time to use the graph coloring-based algorithm for the SRF management; second, an algorithm framework for SRF Coloring that is well suited to the FT64 architecture is proposed — this framework is based on a well-understood graph coloring algorithm for register allocation, together with some modifications to deal with the unusual aspects of SRF problem; third, the SRF Coloring algorithm is implemented in SF95Compiler, a compiler designed for FT64 and SF95. The experimental results show that our approach represents a practical and promising solution to SRF allocation.

international conference on parallel processing | 2007

Evaluation of Transcendental Functions on Imagine Architecture

Xiaobo Yan; Tao Tang; Yu Deng; Jing Du; Xuejun Yang

The fast and accurate evaluation of transcendental functions (e.g. exp, log, sin, and atan) is quite important in many domains. We implement a software inline function library that can be called from KernelC programming language to compute 8 typical functions on Imagine architecture. By exploiting some of the key features of Imagine architecture, we have been able to provide single precision transcendental functions that are very accurate yet can typically be evaluated to get 16 function values in between 18 and 43 clock cycles. In this paper, we also discuss the algorithms and implementation details of these functions.

international conference for young computer scientists | 2008

A Double-Buffering Strategy for the SRF management in the Imagine Stream Processor

Yu Deng; Li Wang; Xiaobo Yan; Xuejun Yang

The stream register file (SRF) is a fast on-chip storage in the Imagine stream processor and its efficient management is essential for good performance. Double-buffering is an important strategy to manage the SRF efficiently. This paper introduces a double-buffering strategy which aims at reducing the overhead of double-buffering and the off-chip memory transfers. Compared with the current strategy for double-buffering, a new heuristic is proposed to determine the optimal buffer size and minimize the overhead of double-buffering. Besides, a reusing-first strategy is presented to reduce the off-chip memory transfers. Preliminary results over some stream programs show that our strategy represents a promising solution for double-buffering.

international conference on parallel and distributed systems | 2007

Efficient generation of stream programs from loops

Xuejun Yang; Yu Deng; Xiaobo Yan; Li Wang; Jing Du; Ying Zhang

The efficiency of scientific applications on the Imagine stream processor is increasingly concerned by researchers. One of the obstacles is that the programming language of Imagine does not target the scientific computing. This paper introduces a program transformation algorithm to automatically transform loops to the stream programs executed on Imagine. The optimization for memory accessing is also considered during the transformation. We have implemented the transformation and optimization algorithm with the GFORTRAN frontend. Preliminary results over benchmark kernels show that our approach is a convenient and efficient solution to develop scientific applications on the Imagine stream processor.

Archive | 2007

A Locality Optimizing Algorithm for Developing Stream Programs in Imagine

Jing Du; Xuejun Yang; Canqun Yang; Xiaobo Yan; Yu Deng

In this paper, we explore a novel locality optimizing algorithm for developing stream programs in Imagine to sustain high computational ability. Our specific contributions include that we formulate the relationship between streams and kernels as a Data&Computation Matrix (D&C Matrix), and present the key techniques for locality enhancement based on this matrix. The experimental results on five representative scientific applications show that our algorithm can effectively improve the computational intensiveness and avoid the utilization of index streams to achieve high locality in LRF and SRF.

Explore More