Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xueqing Lou is active.

Publication


Featured researches published by Xueqing Lou.


Journal of Systems Architecture | 2018

Last level cache layout remapping for heterogeneous systems

Licheng Yu; Tianzhou Chen; Minghui Wu; Xueqing Lou

Abstract Heterogeneous systems with CPU and GPGPU sharing the last level cache (LLC) provide viability and flexibility. However, the different programming models lead to conflicting memory layouts, which are required for best performance of different processors. Software converting that directly accesses target layout is subject to sub-optimal localities. Converting in GPGPU shared memory also incurs copying and synchronization overhead. In this paper, we analyze the memory layout requirement and propose to remap the memory layout in the shared LLC. A remap controller in LLC executes a simple program that calculates target requests from an LLC request in the source memory space. The LLC request is thus remapped to the target memory space with the generated requests. Consequently, all processors always access memory in their optimal data layouts. The locality is thus kept through all the private caches, and software remapping overhead is also eliminated. The tiled-matrix multiplication is discussed as a case study and benchmarks from Polybench/GPU and Rodinia are modified to take advantage of the LLC layout remapping. The experiment results show the average benchmark execution time is decreased to 69%. Compared with CPU software layout converting, the CPU time is decreased to 41%–73%.


The Journal of Supercomputing | 2017

Enable back memory and global synchronization on LLC buffer

Licheng Yu; Yulong Pei; Tianzhou Chen; Xueqing Lou; Minghui Wu; Tiefei Zhang

The last-level cache (LLC) shared by heterogeneous processors such as CPU and general-purpose graphics processing unit (GPGPU) brings new opportunities to optimize data sharing among them. Previous work introduces the LLC buffer, which uses part of the LLC storage as a FIFO buffer to enable data sharing between CPU and GPGPU with negligible management overhead. However, the baseline LLC buffer’s capacity is limited and can lead to deadlock when the buffer is full. It also relies on inefficient CPU kernel relaunch and high overhead atomic operations on GPGPU for global synchronization. These limitations motivate us to enable back memory and global synchronization on the baseline LLC buffer and make it more practical. The back memory divides the buffer storage into two levels. While they are managed as a single queue, the data storage in each level is managed as individual circular buffer. The data are redirected to the memory level when the LLC level is full, and are loaded back to the LLC level when it has free space. The case study of n-queen shows that the back memory has a comparative performance with a LLC buffer of infinite LLC level. On the contrary, LLC buffer without back memory exhibits 10% performance degradation incurred by buffer space contention. The global synchronization is enabled by peeking the data about to be read from the buffer. Any request to read the data in LLC buffer after the global barrier is allowed only when all the threads reach the barrier. We adopt breadth-first search (BFS) as a case study and compare the LLC buffer with an optimized implementation of BFS on GPGPU. The results show the LLC buffer has speedup of 1.70 on average. The global synchronization time on GPGPU and CPU is decreased to 38 and 60–5%, respectively.


international conference on embedded software and systems | 2016

Two Methods for Combining Original Memory Access Coalescing and Equivalent Memory Access Coalescing on GPGPU

Yulong Pei; Licheng Yu; Minghui Wu; Tianzhou Chen; Xueqing Lou; Tiefei Zhang

The modern GPU has powerful parallel processing unit and programmable pipeline, therefore GPU has some advantages for non-graphic computing. However, GPU needs a lot of memory access bandwidth. GPU uses memory access coalescing method to reduce memory access requests that have good locality. Paper [1] proposes equivalent memory access coalescing to improve memory access performance when the programs memory access requests have bad locality. That means original memory access coalescing and equivalent memory access coalescing can complement to each other. In this paper, we propose two methods to combine these two memory access coalescing methods. In the experiment, we choose 30 benchmarks from two suites. By using the first method, 20 benchmarks memory access performance can be improved. The average speed up is 141.4%. By using the second method, 27 benchmarks can choose the better memory access coalescing method.


high performance computing and communications | 2016

WAP: The Warp Feature Aware Prefetching Method for LLC on CPU-GPU Heterogeneous Architecture

Minghui Wu; Yulong Pei; Licheng Yu; Tianzhou Chen; Xueqing Lou; Tiefei Zhang

Recently, researchers discovered a GPU has some advantages for non-graphic computing. CPU-GPU heterogeneous architecture combines CPU and GPU to a chip and makes GPU easier to run non-graphic programs. Researchers also proposed LLC(last-level cache) to store and exchange data between CPU and GPU. We discover the LLC hit rate has great influence on memory access performance and systems performance. Therefore, we propose the WAP(warp feature aware prefetching method) for improving the LLC hit rate and memory access performance. We combine GPGPU-sim and GEM5 to a CPU-GPU heterogeneous many-core simulator, add an LLC in this simulator and choose 10 representative benchmarks. We compare this method with the MAP method. The experimental result illustrates the WAP improves 11.8% than the MAP on the LLC hit rate and 11.39% on IPC.


Archive | 2011

Digital signature method of movable Widget

Licheng Yu; Tianzhou Chen; Minghui Wu; Hui Yan; Xueqing Lou


Archive | 2011

Method for safely accessing network resource by mobile widget

Shaobin Zhang; Tianzhou Chen; Minghui Wu; Hui Yan; Xueqing Lou


Archive | 2011

Method for updating and checking mobile widget client

Shaobin Zhang; Tianzhou Chen; Minghui Wu; Hui Yan; Xueqing Lou


Archive | 2011

Localization method for mobile Widget

Licheng Yu; Tianzhou Chen; Minghui Wu; Hui Yan; Xueqing Lou


IEEE Conference Proceedings | 2016

不均一システムにおける任意のデータ共有のためのLLCバッファ【Powered by NICT】

Licheng Yu; Yulong Pei; Tianzhou Chen; Xueqing Lou; Minghui Wu; Tiefei Zhang


Archive | 2011

Installation method of mobile Widget package

Licheng Yu; Tianzhou Chen; Minghui Wu; Hui Yan; Xueqing Lou

Collaboration


Dive into the Xueqing Lou's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tiefei Zhang

Zhejiang Gongshang University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge