Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xinan Tang is active.

Publication


Featured researches published by Xinan Tang.


international conference on parallel architectures and compilation techniques | 1996

Compiling C for the EARTH multithreaded architecture

Laurie J. Hendren; Xinan Tang; Yingchun Zhu; Guang R. Gao; Xun Xue; Haiying Cai; Pierre Ouellet

Multithreaded architectures provide an opportunity for efficiently executing programs with irregular parallelism and/or irregular locality. This paper presents a strategy that makes use of the multithreaded execution model without exposing multithreading to the programmer. Our approach is to design simple extensions to C, and to provide compiler support that automatically translates high-level C programs into lower-level threaded programs. In this paper we present EARTH-C our extended C language which contains simple constructs for specifying control parallelism, data locality, shared variables and atomic operations. Based on EARTH-C, we describe compiler techniques that are used for translating to lower-level Threaded-C programs for the EARTH multithreaded architecture. We demonstrate our approach with six benchmark programs. We show that even naive EARTH-C programs can lead to reasonable performance, and that more advanced EARTH-C programs can give performance very close to hand-coded threated-C programs.


acm symposium on parallel algorithms and architectures | 1997

Thread partitioning and scheduling based on cost model

Xinan Tang; Jing Wang; Kevin B. Theobald; Guang R. Gao

There has been considerable interest in implementing a multithreaded program exeeution and architecture model on a multiprocessor whose primary processors consist of today’s off-the-shelf microprocessors. Unlike some custom-designed mr.dtithreaded processor architectures, which can interleave multiple threads concurrently, conventional processors can only execute one thread at a time. This presents a unique and challenging problem to the compiler: partition a program into threads so that it executes both correctly and in minimal time. We present a new heuristic algorithm based on an interesting extension of the classical list scheduling algorithm. Based on a cost model, our algorithm groups instructions into t breads by considering the trade-offs among parallelism, latency tolerance, thread switching costs and sequential execution efficiency. The proposed algorithm has been implemented, and its performance measured through experiments on a variety of architecture parameters and a wide range of program parameters. The results show that the proposed algorithm is robust, effective, and efficient.


international conference on parallel architectures and compilation techniques | 1997

Heap analysis and optimizations for threaded programs

Xinan Tang; Rakesh Ghiya; Laurie J. Hendren; Guang R. Gao

Traditional compiler optimizations such as loop invariant removal and common sub-expression elimination are standard in all optimizing compilers. The purpose of the paper is to present new versions of these optimizations that apply to programs using dynamically allocated data structures, and to show the effect of these optimizations on the performance of multithreaded programs. We show how heap pointer analyses can be used to support better dependence testing, new applications of the above traditional optimizations, and high quality code generation for multithreaded architectures. We have implemented these analyses and optimizations in the EARTH-C compiler to study their impact on the performance of generated multithreaded code. We provide both static and dynamic measurements showing the effect of the optimizations applied individually, and together. We note several general trends, and discuss the performance tradeoffs and suggest when specific optimizations are generally beneficial.


Journal of Parallel and Distributed Computing | 1999

Automatically Partitioning Threads for Multithreaded Architectures

Xinan Tang; Guang R. Gao

There is an enormous amount of parallelism exposed to fine-grain multithreaded architectures to cover latencies. It is a demanding task for a multithreading programmer to manage such a degree of parallelism by hand. To use multithreaded architectures efficiently it is essential to have compiler support for automatically partitioning programs into threads. This paper solves a fundamental problem in compiling for multithreaded architectures, automatically partitioning a program into threads. The focus of such partitioning is to overlap the remote communication latency and minimize the total execution time. We first formulate the partitioning problem based on a multithreaded execution cost model. Then, we prove such a formulation is NP-hard. Therefore, we propose two heuristic thread-partitioning methods to solve this problem in practice. The advanced partitioning algorithm is a novel extension of list scheduling, and it takes advantage of the cost model to generate near-optimum partitioning results. The remote-path-based partitioning algorithm is a simplified version of the advanced one but it is easy for compiler implementation. The two partitioning algorithms were implemented respectively in a thread partitioning testbed and a research EARTH-C compiler. The experimental results show that both partitioning algorithms are effective to generate efficient threaded code, and code generated by the compiler is comparable to hand-written code.


acm symposium on parallel algorithms and architectures | 1998

How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm?

Xinan Tang; Guang R. Gao

Adequate compiler support is essential to take advantage of the emerging multithreaded architecture. In this paper, we address two important questions in thread partitioning, which is a key step in compiler design for multithreaded architectures. The questions in which we are interested are: how “hard” is it to partition threads and how “bad” will a heuristic partitioning algorithm be? We propose a cost model for both multithreaded machines and user programs, and we formulate the thread partition problem as an optimization problem. Then, we answer the above two questions by proving that: 1) for the class of programs and architecture models we are interested in, the problem of thread partition for minimum execution time is NP-hard; 2) the run length produced by any list scheduling based thread partitioning algorithm is at most twice as long as that of an optimal solution.


ieee international conference on high performance computing data and analytics | 2000

Design and Implementation of an Efficient Thread Partitioning Algorithm

José Nelson Amaral; Guang R. Gao; Erturk Dogan Kocalar; Patrick O'Neill; Xinan Tang

The development of fine-grain multi-threaded program execution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A successful algorithm must produce a thread partition that best utilizes multiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative affinity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surprisingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed algorithm and our experiments, performed on a wide range of examples, have demonstrated its efficiency and effectiveness.


international conference on supercomputing | 2000

Automatic compiler techniques for thread coarsening for multithreaded architectures

Gary M. Zoppetti; Gagan Agrawal; Lori L. Pollock; José Nelson Amaral; Xinan Tang; Guang R. Gao

Multithreaded architectures are emerging as an important class of parallel machines. By allowing fast context switching between threads on the same processor, these systems hide communication and synchronization latencies and allow scalable parallelism for dynamic and irregular applications. Thread partitioning is the most important task in compiling high-level languages for multithreaded architectures. Non-preemptive multithreaded architectures, which can be built from off-the-shelf components, require that if a thread issues a potentially remote memory request, then any statement that is dependent upon this request must be in a separate thread. When performing thread partitioning on codes that use pointer-based recursive data structures, it is often difficult to extract accurate dependence information. As a result, threads of unnecessarily small granularity get generated, which, because of thread switching costs, leads to increased execution time. In this paper, we present three techniques that lead to improved extraction and representation of dependence information in the presence of structured control flow, references through fields of structures, and pointer-based data structures. The benefit of these techniques is the generation of coarser-grained threads and, therefore, decreased execution time. Our experiments were performed using the EARTH-C compiler and the EARTH multithreaded architecture model emulated on both a cluster of Pentium PCs and a distributed memory multiprocessor. On our set of 6 pointer-based programs, these techniques reduced the static number of threads by 38%. Reductions in execution times ranged from 16% to 45% on the four programs we measured runtime performance.


international parallel processing symposium | 1999

Implementing a Non-Strict Functional Programming Language on a Threaded Architecture

Shigeru Kusakabe; Kentaro Inenaga; Makoto Amamiya; Xinan Tang; Andres Marquez; Guang R. Gao

The combination of a language with fine-grain implicit parallelism and a dataflow evaluation scheme is suitable for high-level programming on massively parallel architectures. We are developing a compiler of V, a non-strict functional programming language, for EARTH(Efficient Architecture for Running THreads). Our compiler generates codes in Threaded-C, which is a lower-level programming language for EARTH. We have developed translation rules, and integrated them into the compiler. Since overhead caused by fine-grain processing may degrade performance for programs with little parallelism, we have adopted a thread merging rule. The preliminary performance results are encouraging. Although further improvement is required for non-strict data-structures, some codes generated from V programs by our compiler achieved comparable performance with the performance of hand-written Threaded-C codes.


Innovative Architecture for Future Generation High-Performance Processors and Systems | 1998

Implementation of a non-strict functional programming language V on a threaded architecture EARTH

Shigeru Kusakabe; Kentaro Inenaga; Makoto Amamiya; Xinan Tang; Andres Marquez; Guang R. Gao

The combination of a language with fine-grain implicit parallelism and a dataflow evaluation scheme is suitable for high-level programming on massively parallel architectures. We are developing a compiler of V, a non-strict functional programming language, for EARTH(Eficient Architecture for Running THreads). Our compiler generates codes in Threaded-C, which is a lower-level programming language for EARTH. We have developed translation rules, and integrated them into the compiler. While EARTH directly supports fine-grain thread execution, thread-level optimization by compiler is also.effective on EARTH. The preliminary performance results are encouraging, although further improvement is required for non-strict datastructures. Some codes generated from V programs by our compiler achieved comparable performance with the performance of hand-written Threaded-C codes.


international conference on parallel architectures and compilation techniques | 1995

A design study of the EARTH multiprocessor

Herbert H. J. Hum; Olivier Maquelin; Kevin B. Theobald; Xinmin Tian; Xinan Tang; Guang R. Gao; Phil Cupryk; Nasser Elmasri; Laurie J. Hendren; Alberto Jimenez; Shoba Krishnan; Andres Marquez; Shamir Merali; Shashank S. Nemawarkar; Prakash Panangaden; Xun Xue; Yingchun Zhu

Collaboration


Dive into the Xinan Tang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andres Marquez

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge