Guanjun Jiang
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guanjun Jiang.
international conference on scalable computing and communications | 2009
Guanjun Jiang; Du Chen; Binbin Wu; Yi Zhao; Tianzhou Chen; Jingwei Liu
With the development of electric technology, uniprocessor is being substituted by CMP (chip multi processors). CMP can run multi-thread program efficiently, many researchers engage in the study of multi-thread, including extracting multi-thread from legacy single thread program and making standardization for future multi-thread program. Data communication among threads is an inevitable problem in multi-thread program, and efficient data sharing is an important aspect for program performance. But researchers focus data sharing on memory organization and relationship among threads, there is little attention for intra-processor. In this paper, we develop a thread assignment method for group sharing L2 cache architecture according to the data relationship among threads. We allocate some threads to some cores and some threads others. In our experiment, we simulates four threads with different degree data sharing and running in four cores CMP, whose cores is divided into two groups. Comparing some program execution tracks, we find that the main difference between two simulations is the hit rate of L2 cache and thread assignment brings 6.25% running time improvement. The L2 cache hit rate is 91.0% and 87.1% with thread assignment our proposed, but the L2 cache hit rate is 77.0% and 75.4% with random thread assignment. It descends 14.0% and 11.7% for each group.
advanced parallel programming technologies | 2009
Guanjun Jiang; Degui Fen; Liangliang Tong; Lingxiang Xiang; Chao Wang; Tianzhou Chen
In recent years, with the possible end of further improvements in single processor, more and more researchers shift to the idea of Chip Multiprocessors (CMPs). The burgeoning of multi-thread programs brings on dramatically increased inter-core communication. Unfortunately, traditional architectures fail to meet the challenge, as they conduct such a kind of communication on the last level of on-chip cache or even on the memory.This paper proposes a novel approach, called Collective Cache, to differentiate the access to shared/private data and handle data communication on the first level cache. In the proposed cache architecture, the share data found in the last level cache are moved into the Collective Cache, a L1 cache structure shared by all cores. We show that the mechanism this paper proposed can immensely enhance inter-processors communication, increase the usage efficiency of L1 cache and simplify data consistency protocol. Extensive analysis of this approach with Simics shows that it can reduce the L1 cache miss rate by 3.36%.
international conference on algorithms and architectures for parallel processing | 2010
Tianzhou Chen; Xingsheng Tang; Jianliang Ma; Lihan Ju; Guanjun Jiang; Qingsong Shi
CMP brings more benefits comparing with uni-core processor, but CMP is not fit for legacy code well because legacy code bases on uni-core processor. This paper presents a novel Thread Level Parallelism technology called Dataflow Abstracting Thread (DFAT). DFAT builds a United Dependence Graph (UDG) for the program and decouples single thread into many threads which can run on CMP parallelly. DFAT analyzes the programs data-, control- and anti-dependence and gets a dependence graph, then dependences are combined and be added some attributes to get a UDG. The UDG decides instructions execution order, and according to this, instructions can be assigned to different thread one by one. An algorithm decides how to assign those instructions. DFAT considers both communication overhead and thread balance after the original thread division. Thread communication in DFAT is implemented by producer-consumer model. DFAT can automatically abstract multi-thread from single thread and be implemented in compiler. In our evaluation, we decouple single thread into at most 8 threads with DFAT and the result shows that decoupling single thread into 4-6 threads can get best benefits.
advanced parallel programming technologies | 2009
Degui Feng; Guanjun Jiang; Tiefei Zhang; Wei Hu; Tianzhou Chen; Mingteng Cao
Chip multiprocessor (CMP) has been the mainstream of processor design with the progress in semiconductor technology. It provides higher concurrency for the threads compared with the traditional single-core processor. Lock-based synchronization of multi-threads has been proved as an inefficient approach with high overhead. The previous works show that TM is an efficient solution to solve the synchronization of multi-threads. This paper presents SPMTM, a novel on-chip memory based nested TM framework. The on-chip memory used in this framework is not cache but scratchpad memory (SPM), which is software-controlled SRAM on chip. TM information will be stored in SPM to enhance the access speed and reduce the power consumption in SPMTM. Experimental results show that SPMTM can obtain average 16.3% performance improvement of the benchmarks compared with lock-based synchronization and with the increase in the number of processor core, the performance improvement is more significant.
Archive | 2008
Tianzhou Chen; Wei Hu; Mingteng Cao; Qingsong Shi; Like Yan; Bin Xie; Degui Feng; Gang Wang; Guanjun Jiang; Yujie Wang
Archive | 2010
Lingxiang Xiang; Tianzhou Chen; Lianghua Miao; Guanjun Jiang; Fuming Qiao; Du Chen; Jianliang Ma; Chunhao Wang; Tiefei Zhang; Man Cao
Archive | 2010
Tianzhou Chen; Wei Hu; Guanjun Jiang; Jingxian Li; Qingsong Shi; Hui Yuan
Archive | 2009
Tianzhou Chen; Guanjun Jiang; Lianghua Miao; Chao Wang; Jian Chen
Archive | 2008
Tianzhou Chen; Wei Hu; Qingsong Shi; Like Yan; Bin Xie; Jiangwei Huang; Tiefei Zhang; Degui Feng; Lingxiang Xiang; Guanjun Jiang
Archive | 2008
Tianzhou Chen; Wei Hu; Qingsong Shi; Like Yan; Bin Xie; Jiangwei Huang; Tiefei Zhang; Degui Feng; Lingxiang Xiang; Guanjun Jiang