Shaoshan Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shaoshan Liu is active.

Explore More

Publication

Featured researches published by Shaoshan Liu.

IEEE Computer Architecture Letters | 2011

Prefetching in Embedded Mobile Systems Can Be Energy-Efficient

Jie Tang; Shaoshan Liu; Zhimin Gu; Chen Liu; Jean-Luc Gaudiot

Data prefetching has been a successful technique in high-performance computing platforms. However, the conventional wisdom is that they significantly increase energy consumption, and thus not suitable for embedded mobile systems. On the other hand, as modern mobile applications pose an increasing demand for high performance, it becomes essential to implement high-performance techniques, such as prefetching, in these systems. In this paper, we study the impact of prefetching on the performance and energy consumption of embedded mobile systems. Contrary to the conventional wisdom, our findings demonstrate that as technology advances, prefetching can be energy-efficient while improving performance. Furthermore, we have developed a simple but effective analytical model to help system designers to identify the conditions for energy efficiency.

IEEE Transactions on Computers | 2009

Potential Impact of Value Prediction on Communication in Many-Core Architectures

Shaoshan Liu; Jean-Luc Gaudiot

The newly emerging many-core-on-a-chip designs have renewed an intense interest in parallel processing. By applying Amdahls formulation to the programs in the PARSEC and SPLASH-2 benchmark suites, we find that most applications may not have sufficient parallelism to efficiently utilize modern parallel machines. The long sequential portions in these application programs are caused by computation as well as communication latency. However, value prediction techniques may allow the ldquoparallelizationrdquo of the sequential portion by predicting values before they are produced. In conventional superscalar architectures, the computation latency dominates the sequential sections. Thus, value prediction techniques may be used to predict the computation result before it is produced. In many-core architectures, since the communication latency increases with the number of cores, value prediction techniques may be used to reduce both the communication and computation latency. In this paper, we extend Amdahls formulation to model the data redundancy inherent to each benchmark, thereby identifying the potential of value prediction techniques. Our analysis shows that the performance of PARSEC benchmarks may improve by a factor of 180 and 230 percent for the SPLASH-2 suite, compared to when only the intrinsic parallelism is considered. This demonstrates the immense potential of fine-grained value prediction in reducing the communication latency in many-core architectures.

IEEE Transactions on Computers | 2013

Acceleration of XML Parsing through Prefetching

Jie Tang; Shaoshan Liu; Chen Liu; Zhimin Gu; Jean-Luc Gaudiot

Extensible Markup Language (XML) has become a widely adopted standard for data representation and exchange. However, its features also introduce significant overhead threatening the performance of modern applications. In this paper, we present a study of XML parsing and determine that memory-side data loading in the parsing stage incurs a significant performance overhead, as much as the computation does. Hence, we propose memory-side acceleration which incorporates of data prefetching techniques, and can be applied on top of computation-side acceleration to speed up the XML data parsing. To this end, we study here the impact of our proposed scheme on the performance and energy consumption and demonstrated how it is capable of improving performance by up to 20 percent as well as produce up to 12.77 percent of energy saving when implemented in 32-nm technology. In addition, we implement a prefetcher on an platform in an effort to evaluate its implementation feasibility in terms of area and energy overhead.

international conference on parallel processing | 2010

Speculative Execution on GPU: An Exploratory Study

Shaoshan Liu; Christine Eisenbeis; Jean-Luc Gaudiot

We explore the possibility of using GPUs for speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU.

The Journal of Supercomputing | 2012

Achieving middleware execution efficiency: hardware-assisted garbage collection operations

Jie Tang; Shaoshan Liu; Zhimin Gu; Xiao-Feng Li; Jean-Luc Gaudiot

Although virtualization technologies bring many benefits to cloud computing environments, as the virtual machines provide more features, the middleware layer has become bloated, introducing a high overhead. Our ultimate goal is to provide hardware-assisted solutions to improve the middleware performance in cloud computing environments. As a starting point, in this paper, we design, implement, and evaluate specialized hardware instructions to accelerate GC operations. We select GC because it is a common component in virtual machine designs and it incurs high performance and energy consumption overheads. We performed a profiling study on various GC algorithms to identify the GC performance hotspots, which contribute to more than 50% of the total GC execution time. By moving these hotspot functions into hardware, we achieved an order of magnitude speedup and significant improvement on energy efficiency. In addition, the results of our performance estimation study indicate that the hardware-assisted GC instructions can reduce the GC execution time by half and lead to a 7% improvement on the overall execution time.

application specific systems architectures and processors | 2010

On energy efficiency of reconfigurable systems with run-time partial reconfiguration

Shaoshan Liu; Richard Neil Pittman; Alessandro Form; Jean-Luc Gaudiot

In this paper we study whether partial reconfiguration can be used to reduce FPGA energy consumption. In an ideal scenario, we will have a hardware accelerator to assist with certain parts of program execution. When the accelerator is not active, we use partial reconfiguration to unload it to reduce both static and dynamic power. However, the reconfiguration process may introduce a high energy overhead, thus it is unclear whether this approach is feasible. To approach this problem, we identify the conditions under which partial reconfiguration can be used to reduce energy consumption, and we propose solutions to minimize the configuration energy overhead. The results of our study show that by using partial reconfiguration to reduce the power consumption of the accelerator when it is inactive, we can accelerate program execution and at the same time halve the overall energy consumption.

International Journal of Parallel Programming | 2011

Value Prediction and Speculative Execution on GPU

Shaoshan Liu; Christine Eisenbeis; Jean-Luc Gaudiot

GPUs and CPUs have fundamentally different architectures. It is conventional wisdom that GPUs can accelerate only those applications that exhibit very high parallelism, especially vector parallelism such as image processing. In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU. Based on these observations, we explore the hardware implementation of speculative execution operations on GPU architectures to reduce the software performance overheads. The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5–8%) and power consumption (1–5%) overheads.

annual computer security applications conference | 2007

Synchronization mechanisms on modern multi-core architectures

Shaoshan Liu; Jean-Luc Gaudiot

While the semiconductor industry has provided us with powerful systems for personal supercomputing, how to efficiently harness the computing power of these systems still remains a major unsolved problem. This challenge must be approached by simultaneously solving the synchronization problem and the parallel programmability problem. This paper reviews the synchronization issues in modern parallel computer architectures, surveys the state of the art approaches used to alleviate these problems, and proposes our Request-Store-Forward (RSF) model of synchronization. This model splits the atomic synchronization operations into two phases, thus freeing the processing elements from polling operations. Finally, we show how we could learn from nature and improve the overall system performance by closely coupling peripheral computing units and functional units.

application specific systems architectures and processors | 2010

Hardware-assisted middleware: Acceleration of garbage collection operations

Jie Tang; Shaoshan Liu; Zhimin Gu; Xiao-Feng Li; Jean-Luc Gaudiot

Although the virtualization technology brings many benefits to cloud computing environments, as the virtual machines provide more features, the middleware layer has become bloated, introducing a high overhead. Our ultimate goal is to provide hardware-assisted solutions to improve the middleware performance in cloud computing environments. As a starting point, in this paper, we design, implement, and evaluate specialized hardware instructions to accelerate GC operations. We select GC because it is a common component in virtual machine designs and it incurs high performance and energy consumption overheads. We performed a profiling study on various GC algorithms to identify the GC performance hotspots, which contribute to more than 50% of the total GC execution time. By moving these hotspot functions into hardware, we managed to achieve an order of magnitude speedup.

IEEE Transactions on Computers | 2012

Packer: Parallel Garbage Collection Based on Virtual Spaces

Shaoshan Liu; Jie Tang; Ligang Wang; Xiao-Feng Li; Jean-Luc Gaudiot

The fundamental challenge of garbage collector (GC) design is to maximize the recycled space with minimal time overhead. For efficient memory management, in many GC designs the heap is divided into large object space (LOS) and normal object space (non-LOS). When either space is full, garbage collection is triggered even though the other space may still have plenty of room, thus leading to inefficient space utilization. Also, space partitioning in existing GC designs implies different GC algorithms for different spaces. This not only prolongs the pause time of garbage collection, but also makes collection inefficient on multiple spaces. To address these problems, we propose Packer, a parallel garbage collection algorithm based on the novel concept of virtual spaces. Instead of physically dividing the heap into multiple spaces, Packer manages multiple virtual spaces in one physical space. With multiple virtual spaces, Packer offers efficient memory management. With one physical space, Packer avoids the problem of an inefficient space utilization. To reduce the garbage collection pause time, we also propose a novel parallelization method that is applicable to multiple virtual spaces. Specifically, we reduce the compacting GC parallelization problem into a discreted acyclic graph (DAG) traversal parallelization problem, and apply it to both normal and large object compaction.

Explore More