Shouzhen Gu
Chongqing University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shouzhen Gu.
IEEE Transactions on Parallel and Distributed Systems | 2014
Jing Liu; Qingfeng Zhuge; Shouzhen Gu; Jingtong Hu; Guangyu Zhu; Edwin Hsing-Mean Sha
High-performance computing systems typically employ heterogeneous multicore design to improve both execution performance and efficiency. Task assignment is critical in exploiting the diversity of computation capability, energy consumption, as well as communication cost on heterogeneous multicore processors. In this paper, we explore the opportunity of task assignment on heterogeneous multicore processors to minimize execution and communication costs considering time constraint. The general heterogeneous task assignment problem is NP-Complete. However, we find that optimal task assignment can be achieved for widely used, tree-shaped task graphs using dynamic programming. We first propose a dynamic programming algorithm, the Optimal Tree Assign (OTA) algorithm, to generate optimal assignments for trees. Then, we develop the Integer Linear Programming model of the general task assignment problem for Directed Acyclic Graphs. A polynomial-time heuristic, the Extended Tree Assignment algorithm, is also proposed to produce near-optimal solutions for the general heterogeneous task assignment problem efficiently. The experimental results show that the proposed algorithms outperform both homogeneous task assignment method and greedy strategy for all the benchmarks. The OTA algorithm reduces the total system time by 42.5 percent and 23.5 percent on average compared with the homogeneous task assignment method and greedy algorithm, respectively.
IEEE Transactions on Computers | 2014
Jingtong Hu; Qingfeng Zhuge; Chun Jason Xue; Wei-Che Tseng; Shouzhen Gu; Edwin Hsing-Mean Sha
In power and size sensitive embedded systems, non-volatile memories (NVMs) are replacing DRAM as the main memory since they have higher density, lower static power consumption, and lower costs. Unfortunately, these technologies are limited by their endurance and long write latencies. To minimize the main memory access time and extend the lifetime of the NVM, we optimally schedule tasks by an ILP formulation. We also present a heuristic, Concatenation Scheduling, to solve large problems in a reasonable amount of time. Our experimental results show that when compared with list scheduling, concatenation scheduling can reduce the total memory access time by an average of 9.99% and increase the lifetime of the NVM by 26.66%. When compared with list scheduling, ILP can reduce the total memory access time by an average of 12.39% and increase the lifetime of the NVM by 38.74%.
IEEE Transactions on Parallel and Distributed Systems | 2015
Shouzhen Gu; Qingfeng Zhuge; Juan Yi; Jingtong Hu; Edwin Hsing-Mean Sha
Multi-core processors have been adopted in modern embedded systems to meet the ever increasing performance requirements. Scratchpad memory (SPM), a software-controlled on-chip memory, has been used in embedded systems as an alternative to hardware-controlled cache due to its advantage in die area, power consumption, and timing predictability. SPMs in multi-core systems can be accessed by both local core and remote cores. In order to alleviate data contention on a SPM unit, multi-port SPMs are employed in multi-core systems. In such systems, proper task scheduling and data assignment can significantly improve the overall performance by exploring the parallelism of computation tasks and concurrent data accesses on SPMs. Since scheduling for multi-core systems is NP-Complete in general. In this paper, we propose an ILP formulation to optimally determine the task scheduling and data assignment on multi-core systems with multi-port SPMs. Since ILP takes exponential time to finish, we also propose a heuristic method, including the task assignment with remote access reduced (TARAR) algorithm and the minimum memory access cost (MMAC) algorithm, to obtain near optimal solutions within polynomial time. According to the experimental results, the ILP formulation can improve the system performance by 23.02 percent over the HAFF algorithm on average, while the heuristic algorithm can improve the system performance by 16.48 percent over HAFF on average.
Journal of Systems Architecture | 2014
Linbo Long; Duo Liu; Jingtong Hu; Shouzhen Gu; Qingfeng Zhuge; Edwin Hsing-Mean Sha
Phase change memory (PCM) has emerged as a promising candidate to replace DRAM in embedded systems, due to its appealing properties, such as zero leakage power, scalability, shock-resistivity and high density. However, it can only sustain a limited number of write operations. On the other hand, as a program in embedded systems usually distributes write traffic in an extremely unbalanced way, which could further decrease PCM lifetime.In this paper, we propose a space-based wear leveling technique in software compiler level by exploiting the program-specific features. The basic idea is to extend frequently written variables into specific-sized arrays, and evenly distribute writes on allocated array. In such way, we can effectively distribute the write traffic of the program across the whole PCM chip. A space allocation and reuse (SAR) strategy and a polynomial-time algorithm are proposed to produce optimal and near-optimal space allocation, respectively, for achieving a balanced write distribution. The experimental results show our technique can greatly extend the lifetime of PCM-based embedded systems compared with the previous work, and achieve approximately 94% the theoretical maximum of lifetime. Compared with a baseline scheme without wear-leveling mechanism, our technique introduces no more than 0.8% extra writes and 0.7% running overhead.
IEEE Transactions on Multi-Scale Computing Systems | 2016
Chen Pan; Shouzhen Gu; Mimi Xie; Yongpan Liu; Chun Jason Xue; Jingtong Hu
Non-volatile Memories (NVMs), have many promising characteristics, such as low leakage power, low cost, non-volatility, and high scalability, which are all attractive for embedded systems to employ them as the main memory. However, one of the constraints that undermine the credential of NVMs as main memory is its limited write endurance. To tackle this problem, this paper proposes five techniques: Rearrangement Inequality Based Page Allocation (RIPA), Virtual Page Mapping (VPM), On-demand Memory Merging and Splitting (OMS), Periodical Page Swapping (PPS), and Normalized Boundary Calibration (NBC) to evenly distribute the writes on Nonvolatile Main Memory (NVMM) purely on the Operating System (OS) level, which can greatly extend lifetime of NVMM. Without extra hardware support, OS management is easy to be integrated into existing embedded systems. The experimental results show that with less then 0.6 percent performance overhead the proposed techniques can extend the lifetime of NVMM to 17.28 times longer compared with traditional methods.
embedded and real-time computing systems and applications | 2013
Linbo Long; Dou Liu; Jingtong Hu; Shouzhen Gu; Qingfeng Zhuge; Edwin Hsing-Mean Sha
Phase change memory (PCM) has emerged as a promising candidate to replace DRAM in embedded systems. However, it can only sustain a limited number of write operations. To solve this issue, this paper proposes a novel and effective wear-leveling technique in software level to prolong the lifetime of PCM-based embedded systems. A polynomial-time algorithm, Multi-Space Wear Leveling Algorithm (MWL), is proposed to achieve effective wear-leveling. The experimental results show our technique can greatly extend the lifetime of PCM-based embedded systems compared with the previous work. Compared with the method without adopting wear-leveling, it introduces no more than 0.7% extra writes and 0.6% running overhead.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016
Shouzhen Gu; Edwin Hsing-Mean Sha; Qingfeng Zhuge; Yiran Chen; Jingtong Hu
Applications that run in the embedded systems normally should be finished within a timing constraint in energy-efficient fashion. Due to these two requirements, the embedded systems often employ software-controlled scratch pad memory (SPM) instead of hardware-controlled cache as their on-chip memory. The data accesses in SPMs are controlled purely by the software, which provides better time-predictability and precise time-control. In this paper, we propose a time, energy, and area efficient domain wall memory (DWM)-based SPM for embedded systems. To efficiently manage this type of novel SPM, an integer nonlinear programming formulation and the instructions group schedule algorithm are proposed to generate memory access instruction scheduling and data placement. In addition, the longest move reduce algorithm is also proposed to configure different types of DWM memory cells to achieve minimal area size. Experimental results show that the proposed techniques can generate a configuration of DWM-based SPM with minimal area size while satisfying time constraint.
signal processing systems | 2015
Juan Yi; Qingfeng Zhuge; Jingtong Hu; Shouzhen Gu; Mingwen Qin; Edwin Hsing-Mean Sha
Heterogeneous multiprocessors have become the mainstream computing platforms nowadays and are increasingly employed for critical applications. Inherently, heterogeneous systems are more complex than homogeneous systems. The added complexity increases the potential of system failures. This paper addresses this problem by proposing a reliability-guaranteed task assignment and scheduling approach for heterogeneous multiprocessors considering timing constraint. We propose a two-phase approach to solve this problem. In the first phase, we determine assignments for heterogeneous multiprocessors such that both reliability requirement and timing constraint can be satisfied with the minimum total system cost. Efficient algorithms are proposed to produce optimal solutions for simple-path and tree-structured task graphs in polynomial time. When the input graph is a directed acyclic graph (DAG), the task assignment problem is NP-Complete. In this situation, we first develop an Integer Linear Programming (ILP) formulation to generate optimal solutions. Then, we propose a polynomial-time heuristic algorithm to find near optimal solutions. In the second phase, based on the assignments obtained in the first phase, we propose a minimum resource scheduling algorithm to generate a schedule and a feasible configuration that uses as little resource as possible. Experimental results show that the proposed algorithms and the ILP formulation can effectively reduce the total cost compared with the previous work.
international conference on acoustics, speech, and signal processing | 2013
Shouzhen Gu; Qingfeng Zhuge; Jingtong Hu; Juan Yi; Edwin Hsing-Mean Sha
Virtually Shared Scratch-Pad Memory (VS-SPM) with multiple memory banks can be used as on-chip memory on multiprocessor systems-on-chips (MPSoCs) to close the speed gap between fast processors and slow memories. By exploring the parallelism of computation tasks on processors and concurrent data accesses on each SPM, the results of task assignment and data allocation can significantly affect the overall performance of a schedule. In this paper, we propose ILP formulations for solving the problem of task assignment and scheduling on MPSoCs with multi-bank VS-SPM.We also propose a polynomial-time algorithm, the Potential Remote Access Prediction (PRAP) algorithm, to generate near-optimal results efficiently. The experimental results demonstrate the effectiveness of our technique.
signal processing systems | 2016
Shouzhen Gu; Qingfeng Zhuge; Juan Yi; Jingtong Hu; Edwin Hsing-Mean Sha
As the advance of memory technologies, multiple types of memories such as different kinds of non-volatile memory (NVM), SRAM, DRAM, etc. provide a flexible configuration considering performance, energy and cost. For improving the performance of systems with multiple types of memories, data allocation is one of the most important tasks. The previous studies on data allocation problem assume the worst (fixed) case of data-access frequencies. However, the data allocation produced by employing worst case usually leads to an inferior performance for most of time. In this paper, we model this problem by probabilities and design efficient algorithms that can give optimal-cost data allocation with a guaranteed probability. We propose DAGP algorithm produces a set of feasible data allocation solutions which generates the minimum access time or cost guaranteed by a given probability. We also propose a polynomial-time algorithm, MCS algorithm, to solve this problem. The experiments show that our technique can significantly reduce the access cost compared with the technique considering worst case scenario. For example, comparing with the optimal result generated by employing the worst cases, DAGP can reduce memory access cost by 9.92 % on average when guaranteed probability is set to be 0.9. Moreover, for 90 percents of cases, memory access time is reduced by 12.47 % on average. Comparing with greedy algorithm, DAGP and MCS can reduce memory access cost by 78.92 % and 44.69 % on average when guaranteed probability is set to be 0.9.