Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhimin Tang is active.

Publication


Featured researches published by Zhimin Tang.


ieee international conference on high performance computing data and analytics | 1999

JIAJIA: A Software DSM System Based on a New Cache Coherence Protocol

Weiwu Hu; Weisong Shi; Zhimin Tang

This paper describes design and evaluation of a software distributed shared memory (DSM) system called JIAJIA. JIAJIA is a home-based software DSM system in which physical memories of multiple computers are combined to form a larger shared space. It implements the lock-based cache coherence protocol which totally eliminates directory and maintains coherence through accessing write notices kept on the lock. Our experiments with some widely accepted DSM benchmarks such as SPLASH2 program suite and NAS Parallel Benchmarks indicate that, compared to recent software DSMs such as CVM, higher performance is achieved by JIAJIA. Besides, JIAJIA can solve large problems that cannot be solved by other software DSMs due to memory size limitation.


international parallel processing symposium | 1999

Reducing system overheads in home-based software DSMs

Weiwu Hu; Weisong Shi; Zhimin Tang

Software DSM systems suffer from the high communication and coherence-induced overheads that limit performance. This paper introduces our efforts in reducing system overheads of a home-based software DSM called JIAJIA. Three measures, including eliminating false sharing through avoiding unnecessarily invalidating cached pages, reducing virtual memory page faults with a new write detection scheme, and propagating barrier message in a hierarchical way, are taken to reduce the system overhead of JIAJIA. Evaluation with some well-known DSM benchmarks reveals that, though varying with memory reference patterns of different applications, these measures can reduce system overhead of JIAJIA effectively.


Journal of Computer Science and Technology | 1998

A lock-based cache coherence protocol for scope consistency

Weiwu Hu; Weisong Shi; Zhimin Tang; Ming Li

Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors. Although scalable to a certain extent, directory protocols are complex enough to prevent it from being used in very large scale multiprocessors with tens of thousands of nodes. This paper proposes a lock-based cache coherence protocol for scope consistency. It does not rely on directory information to maintain cache coherence. Instead, cache coherence is maintained through requiring the releasing processor of a lock to store all write-notices generated in the associated critical section to the lock and the acquiring processor invalidates or updates its locally cached data copies according to the write notices of the lock. To evaluate the performance of the lock-based cache coherence protocol, a software DSM system named JIAJIA is built on network of workstations. Besides the lock-based cache coherence protocol, JIAJIA also characterizes itself with its shared memory organization scheme which combines the physical memories of multiple workstations to form a large shared space. Performance measurements with SPLASH2 program suite and NAS benchmarks indicate that, compared to recent SVM systems such as CVM, higher speedup is achieved by JIAJIA. Besides, JIAJIA can solve large scale problems that cannot be solved by other SVM systems due to memory size limitation.


Cluster Computing | 2001

Optimizing Home-Based Software DSM Protocols

Weiwu Hu; Weisong Shi; Zhimin Tang

Software DSMs can be categorized into homeless and home-based systems both have strengths and weaknesses when compared to each other. This paper introduces optimization methods to exploit advantages and offset disadvantages of the home-based protocol in the home-based software DSM JIAJIA. The first optimization reduces the overhead of writes to home pages through a lazy home page write detection scheme. The normal write detection scheme write-protects shared pages at the beginning of a synchronization interval, while the lazy home page write detection delays home page write-protecting until the page is first fetched in the interval so that home pages that are not cached by remote processors do not need to be write-protected. The second optimization avoids fetching the whole page on a page fault through dividing a page into blocks and fetching only those blocks that are dirty with respect to the faulting processor. A write vector table is maintained for each shared page in its home to record for each processor which block(s) has been modified since the processor fetched the page last time. The third optimization adaptively migrates home of a page to the processor most frequently writes to the page to reduce twin and diff overhead. Migration information is piggybacked on barrier messages and no additional communication is required for the migration. Performance evaluation with some well-accepted benchmarks and real applications shows that the above optimization methods can reduce page faults, message amounts, and diffs dramatically and consequently improve performance significantly.


Operating Systems Review | 1997

An interaction of coherence protocols and memory consistency models in DSM systems

Weisong Shi; Weiwu Hu; Zhimin Tang

Coherence protocols and memory consistency models are two improtant issues in hardware coherent shared memory multiprocessors and softare distributed shared memory(DSM) systems. Over the years, many researchers have made extensive study on these two issues repectively. However, the interaction between them has not been studied in the literature. In this paper, we study the coherence protocols and memory consistency models used by hardware and software DSM systems in detail. Based on our analysis, we draw a general definition for memory consistency model, i.e., memory consistency model is the logical sum of the ordering of events in each processor and coherence protocol. We also point that in hardware DSM system the emphasis of memory consistency model is relaxing the restriction of event ordering, while in software DSM system, memory consistency model focuses mainly on relaxing coherence protocol. Taking Lazy Release Consistency(LRC) as an example, we analyze the relationship between coherence protocols and memory consistency models in software DSM systems, and find that whether the advantages of LRC can be exploited or not depends greatly on its corresponding protocol. We draw the conclusion that the more relaxed consistency model is, the more relaxed coherence protocol needed to support it. This conclusion is very useful when we design a new consistency model. Furthermore, we make some improvements on traditional multiple writer protocol, and as far as we aware, we describe the complex state transition for multiple writer protocol for the first time. In the end, we list the main research directions for memory consistency models in hardware and software DSM systems.


international symposium on parallel architectures algorithms and networks | 1999

Dynamic computation scheduling for load balancing in home-based software DSMs

Weisong Shi; Zhimin Tang

Load balancing is a critical issue for achieving good performance in parallel and distributed systems. However, this issue is neglected in the research area of software DSMs in the past decade. In this paper, we present and evaluate a dynamic computation scheduling scheme for load balancing of iterative applications in software DSM system. The experiment platform is a home based DSM system named JIAJIA. Preliminary results show that this load balancing scheme is efficient and can be used in other software DSM systems. Compared with simple chunk self scheduling scheme which works well for single iteration applications, the system performance is improved by about 30% with the affinity-based self scheduling proposed in this paper.


international conference on computer design | 2006

Microarchitecture and Performance Analysis of Godson-2 SMT Processor

Zusong Li; Xianchao Xu; Weiwu Hu; Zhimin Tang

This paper introduces the microarchitecture and logical implementation of SMT (Simultaneous Multithreading) improvement of Godson-2 processor which is a 64-bit, four-issue, out-of-order execution high performance processor. The condition for implementing correct memory consistency model in Godson-2 SMT processor is studied and a new register-level sharing and synchronization scheme is proposed. Godson-2 SMT processor has been implemented at the RTL level and simulated with the VstationPro of Mentor Graphics. The Linux operating system is ported to run in Godson-2 SMT processor and application programs such as SPEC CPU2000 benchmark suite are used to evaluate performance. Experimental results indicate that the performance of Godson-2 SMT processor is improved significantly by fully exploiting thread-level parallelism and optimized utilization of functional units. The average speedup is 31.3% with 18.8% area overhead.


ieee international conference on high performance computing data and analytics | 2000

Running real applications on software DSMs

Weiwu Hu; Fuxin Zhang; Li Ren; Weisong Shi; Zhimin Tang

This paper introduces our experiences with some real applications on the home-based software DSM JIAJIA and discusses techniques of parallelizing a sequential program to run on software DSM. It categorizes parallel program segments into five patterns: single-process sequential, mutual-exclusive sequential, data-parallel, task-parallel, and common-parallel. The usage of each pattern is then discussed with the real applications as examples. With some guide from their owners, these programs are parallelized to the API of JIAJIA in a very short time. Satisfactory speed-ups are achieved for them on a cluster of eight Pentium II PCs connected by a 100 Mbps switched Ethernet. Our experiences imply that with the advances of software DSMs and network technologies, the time for pushing the software DSM into the parallel processing mainstream has come and efforts should be made by software DSM researchers to expand the application of software DSMs.


international performance computing and communications conference | 2000

A novel multicast scheme to reduce cache invalidation overheads in DSM systems

Zhiyu Zhou; Weisong Shi; Zhimin Tang

Directory-based write-invalidate cache coherence protocols have been widely used in distributed shared memory (DSM) systems, in which cache invalidation overheads occupy a large part of the system overheads. In this paper, we propose a novel tree-based multidestination multicast scheme TBM which involves a new efficient multidestination message format. TBM combines the best features of two existing approaches: tree-based multicast and multidestination message passing. With the new scheme, only one invalidation message and less than 2/sup [log2(n+1)]-1/ acknowledgement messages are required in one cache invalidation transaction when n processor nodes have the copies of the cache block. Detailed analysis and simulation in 2D mesh show that TBM is preferable to traditional Umesh, Hamiltonian Path and BRCP-HL multicast schemes, which indicates that current and future DSM systems can take advantage of this scheme to deliver better performance.


Journal of Computer Science and Technology | 2000

Using confidence interval to summarize the evaluating results of DSM systems

Weisong Shi; Zhimin Tang; Jinsong Shi

Distributed Shared Memory (DSM) systems have gained popular acceptance by combining the scalability and low cost of distributed system with the ease of use of single address space. Many new hardware DSM and software DSM systems have been proposed in recent years. In general, benchmarking is widely used to demonstrate the performance advantages of new systems. However, the common method used to summarize the measured results is the arithmetic mean of ratios, which is incorrect in some cases. Furthermore, many published papers list a lot of data only, and do not summarize them effectively, which confuse users greatly. In fact, many users want to get a single number as conclusion, which is not provided in old summarizing techniques. Therefore, a new data-summarizing technique based on confidence interval is proposed in this paper. The new technique includes two data-summarizing methods: (1) paired confidence interval method; (2) unpaired confidence interval method. With this new technique, it is concluded that at some confidence one system is better than others. Four examples are shown to demonstrate the advantages of this new technique. Furthermore, with the help of confidence level, it is proposed to standardize the benchmarks used for evaluating DSM systems so that a convincing result can be got. In addition, the new summarizing technique fits not only for evaluating DSM systems, but also for evaluating other systems, such as memory system and communication systems.

Collaboration


Dive into the Zhimin Tang's collaboration.

Top Co-Authors

Avatar

Weisong Shi

Wayne State University

View shared research outputs
Top Co-Authors

Avatar

Weiwu Hu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Dongrui Fan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Da Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiaochun Ye

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wenming Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiaowei Shen

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xu Tan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Gang Shi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hao Zhang

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge