Sanchuan Guo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sanchuan Guo is active.

Explore More

Publication

Featured researches published by Sanchuan Guo.

networking architecture and storages | 2012

Wear-Resistant Hybrid Cache Architecture with Phase Change Memory

Sanchuan Guo; Zhenyu Liu; Dongsheng Wang; Haixia Wang; Guohong Li

Phase-change Random Access Memory (PRAM) is one of the most promising technologies among emerging non-volatile memory technologies, which provides many benefits, such as high density, non-volatility and low leakage power. However, the limited write endurance of PRAM prevents it from being used as a drop-in replacement of SRAM cache. Moreover, the inherent high latency and power dissipation of write operations are both hindrances that PRAM faces. In this paper, we study the L2 cache write operations incurred by different types of data, and accordingly, propose Wear-Resistant Hybrid Cache Architecture (WRHCA), in which the write access behavior of the hybrid L2 cache, that is composed of SRAM and PRAM, is optimized. Through the prediction of data access patterns, the proposed WRHCA prevents write-prone data from entering PRAM L2 cache, and consequently, the wear-out issue of PRAM is alleviated efficiently. Experimental results on the basis of the trace-driven simulator demonstrate that, as compared to the baseline system with pure PRAM L2 cache, our optimized WRHCA saved 85.5% write operations to PRAM on average, and boosted the performance by the averaged 6.4% CPI reduction. Last but not least, as compared with the primitive 3-level SRAM cache with the same chip area, our WRHCA achieved 60.9% reduction in terms of power consumption.

international conference on image processing | 2014

Binary classification based linear rate estimation model for HEVC RDO

Zhenyu Liu; Sanchuan Guo; Dongsheng Wang

Rate-Distortion Optimization in High Efficiency Video Coding promotes the coding efficiency, but also imposes intensive computation to the encoder, because the complex Syntax-based context-adaptive Binary Arithmetic Coding is performed for each candidate coding configuration. We develop the classification based regression method to derive the rate models, which fast estimate the bit cost of quantization coefficient block from its distribution features. Experiments demonstrate that, our method reduces the averaged 28.4% computation time in rate cost estimation, while the coding efficiency degradation is 0.0428dB.

data compression conference | 2014

Linear Rate Estimation Model for HEVC RDO Using Binary Classification Based Regression

Sanchuan Guo; Zhenyu Liu; Dongsheng Wang; Qingrui Han; Yang Song

international conference on computer design | 2013

Bayesian theory oriented Optimal Data-Provider Selection for CMP

Guohong Li; Zhenyu Liu; Sanchuan Guo; Chongmin Li; Dongsheng Wang

With the number of cores and working sets of parallel workloads soaring, shared L2 caches exhibit fewer misses than private L2 caches via making better use of the all available cache capacity. However, shared L2 caches induce higher overall L1 miss latencies because of longer average distance between requestor and home node, and potentially congestions at some nodes. We observe that there is a high probability that the requested data of an L1 miss resides in a neighbor nodes L1 cache. In such cases, these long-distance accesses to the home nodes can be potentially avoided. In order to successfully leverage the aforementioned property, we propose Bayesian theory oriented Optimal Data-Provider Selection (ODPS). ODPS partitions the multi-core into clusters of 2×2 nodes, and introduces the Proximity Data Prober (PDP) to detect whether an L1 miss can be served by one L1 cache within the same cluster. Furthermore, we devise the Bayesian Decision Classifier (BDC) to intelligently and adaptively select a remote L2 cache or a neighboring L1 node as the data provider according to the minimal miss cost based on the Bayesian decision theory.

symposium on computer architecture and high performance computing | 2013

Cluster Cache Monitor

Guohong Li; Olivier Temam; Zhenyu Liu; Dongsheng Wang; Sanchuan Guo; Chongmin Li

As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of 2×2 nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15% and reduce the energy by 14%, while saving 28% of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.

international symposium on circuits and systems | 2013

Content-aware write reduction mechanism of phase-change RAM based Frame Store in H.264 Video codec system

Sanchuan Guo; Zhenyu Liu; Guohong Li; Dongsheng Wang

H.264 video codec system requires big capacity of Frame Store (FS) for buffering reference frames. The up-to-date Phase-change Random Access Memory (PRAM) is the promising approach for on-chip caching the reference signals, as PRAM offers the advantages in terms of high density and low leakage power. However, the write endurance problem, that is a PRAM cell can only tolerant limited number of write operations, becomes the main barrier in practical applications. This paper studies the wear reduction techniques of PRAM based FS in H.264 codec system. On the basis of rate-distortion theory, the content oriented selective writing mechanisms are proposed to reduce bit updates in the reference frame buffers. Experiments demonstrate that, for typical video sequences with different frame sizes, our methods averagely achieve more than 30% reduction of bit updates, while introducing around 20% BDBR cost. The power consumption is reduced by 55% on average, and the estimated PRAM lifetime is extended by 61%.

Archive | 2010