Lingxiang Xiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lingxiang Xiang is active.

Explore More

Publication

Featured researches published by Lingxiang Xiang.

international conference on supercomputing | 2009

Less reused filter: improving l2 cache performance via filtering less reused lines

Lingxiang Xiang; Tianzhou Chen; Qingsong Shi; Wei Hu

The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines that are never reused or reused for few times. In this case, the cache performance can be improved through retaining a portion of working set in cache for a period long enough. Previous schemes approach this by bypassing never reused lines. Nevertheless, severely constrained by the number of never reused lines, sometimes they deliver no benefit due to the lack of never reused lines. This paper proposes a new filtering mechanism that filters out the less reused lines rather than just never reused lines. The extended scope of bypassing provides more opportunities to fit the working set into cache. This paper also proposes a Less Reused Filter (LRF), a separate structure that precedes L2 cache, to implement the above mechanism. LRF employs a reuse frequency predictor to accurately identify the less reused lines from incoming lines. Meanwhile, based on our observation that most less reused lines have a short life span, LRF places the filtered lines into a small filter buffer to fully utilize them, avoiding extra misses. Our evaluation, for 24 SPEC 2000 benchmarks, shows that augmenting a 512KB LRU-managed L2 cache with a LRF having 32KB filter buffer reduces the average MPKI by 27.5%, narrowing the gap between LRU and OPT by 74.4%.

advanced parallel programming technologies | 2009

L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

Guanjun Jiang; Degui Fen; Liangliang Tong; Lingxiang Xiang; Chao Wang; Tianzhou Chen

In recent years, with the possible end of further improvements in single processor, more and more researchers shift to the idea of Chip Multiprocessors (CMPs). The burgeoning of multi-thread programs brings on dramatically increased inter-core communication. Unfortunately, traditional architectures fail to meet the challenge, as they conduct such a kind of communication on the last level of on-chip cache or even on the memory.This paper proposes a novel approach, called Collective Cache, to differentiate the access to shared/private data and handle data communication on the first level cache. In the proposed cache architecture, the share data found in the last level cache are moved into the Collective Cache, a L1 cache structure shared by all cores. We show that the mechanism this paper proposed can immensely enhance inter-processors communication, increase the usage efficiency of L1 cache and simplify data consistency protocol. Extensive analysis of this approach with Simics shows that it can reduce the L1 cache miss rate by 3.36%.

international conference on future generation communication and networking | 2008

Coordinating System Software for Power Savings

Lingxiang Xiang; Jiangwei Huang; Tianzhou Chen

Power consumption is becoming a primary concern as a result of tremendous increasing in computer power usage. Innumerable methods and techniques have been exploited to address this problem but few concentrate on collaborative approaches. This paper presents coordination mechanisms that integrate operating systems with compilers under power reduction techniques such as DPM and DVS. By remaining the information generated at compile stage about an application¿s structure and performance characteristics as much as possible and committing it to system software at run stage, the system software, especially the operation system and compiler, are collaborating toward file-grain power optimizations. The proposed coordination mechanisms also make it possible to integrate the power optimization approaches in our prior work into a whole system. Thus, the optimizations working at distinct levels can be overlaid at run time, and the power reduction effect can be enhanced.

green computing and communications | 2010

Shared Register File Based ILP for Multicore

Lihan Ju; Wei Hu; Lingxiang Xiang; Tianzhou Chen

With the development of semi-conductor industry, more transistors can be integrated onto a single chip. But the software programming model cannot fit the parallelism requirement of CMP (Chip Multi Processor) based architecture. The communication between different cores becomes a very serious problem, and it made bad effectiveness on performance. This paper proposes an approach called API (Architecture of Parallelism on Instructions) which can scan the source code of the programs, analyze the data dependency, and cluster retentive instructions together. The instructions without dependency can be issued directly in parallel by different cores. API provides a global register file for the effective execution of the programs on CMP chips. We have also evaluated the time consuming comparison between API and the traditional architecture in our experiments by using SPEC benchmark CPU2000. The experimental results show that the instruction clock in API is only 49 percent of original instruction clocks. Moreover, there only need 4 cores to approach the best performance.

computer and information technology | 2010

Global Register Alias Table: Executing Sequential Program on Multi-Core

Chunhao Wang; Lihan Ju; Di Wu; Lingxiang Xiang; Wei Hu; Tianzhou Chen

Executing sequential program on multi-core is crucial for accommodating instruction level parallelism (ILP) in chip multiprocessor (CMP) architecture. One widely used method of steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings much hardware complexity and area overhead. This paper presents the Global Register Alias Table (GRAT), a structure which can be used in CMP architecture to facilitate sequential program execution across cores. The GRAT also reduces the area and complexity for steering instructions greatly without introducing additional programming effort and compiler support. In our evaluation, the result shows that our work performs within 5.9% of Core Fusion, a recent proposal which requires a complex steering unit.

Archive | 2010