Xianzhang Chen
Chongqing University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xianzhang Chen.
IEEE Transactions on Very Large Scale Integration Systems | 2016
Xianzhang Chen; Edwin Hsing-Mean Sha; Qingfeng Zhuge; Chun Jason Xue; Weiwen Jiang; Yuangang Wang
A domain-wall memory (DWM) is becoming an attractive candidate to replace the traditional memories for its high density, low-power leakage, and low access latency. Accessing data on DWM is accomplished by shift operations that move data located on nanowires to read/write ports. Due to this kind of construction, data accesses on DWM exhibit varying access latencies. Therefore, data placement (DP) strategy has a significant impact on the performance of data accesses on DWM. In this paper, we prove the nondeterministic polynomial time (NP)-completeness of the DP problem on DWM. For the DWMs organized in single DWM block cluster (DBC), we present integer linear programming formulations to solve the problem optimally. We also propose an efficient single DBC placement (S-DBC-P) algorithm to exploit the benefits of multiple read/write ports and data locality. Compared with the sequential DP strategy, S-DBC-P reduces 76.9% shift operations on average for eight-port DWMs. Furthermore, for DP problem on the DWMs organized in multiple DBCs, we develop an efficient multiple DBC placement (M-DBC-P) algorithm to utilize the parallelism of DBCs. The experimental results show that the M-DBC-P achieves 90% performance improvement over the sequential DP strategy.
signal processing systems | 2016
Weiwen Jiang; Qingfeng Zhuge; Xianzhang Chen; Lei Yang; Juan Yi; Edwin Hsing-Mean Sha
Multiprocessor System-on-Chip with self-ti-med design becomes increasingly attractive due to its ability to exploit high parallelism of applications. Previous research efforts on self-timed techniques mostly focused on hardware layer. However, the problem of correctly synthesizing self-timed systems remains to be difficult. In particular, the problem of how to configure a self-timed ring structure to achieve the maximal throughput with no deadlock is still unsolved. Self-timed ring (STR) is composed of a ring of connected “stages”, each consisting of a processing element, communication units and its current state. The correct configuration of STR is determined by the initial state of each stage and a number of inserted buffers into the ring to maintain correct behavior of applications on an STR. This paper establishes a series of theorems based on the understanding of properties of self-timed structures. Based on the theorems, the setting of initial states and buffers can be decided to guarantee correct configuration. Our theorem also establishes mathematical formulas to calculate throughput of an STR. The algorithms presented in the paper find the optimal initial configuration of an STR that achieves the maximum throughput with the minimum number of inserted buffers. The experimental results show that the throughput of applications mapped on STR with the optimal configuration is improved by 64.99 % on average compared with synchronous system.
IEEE Transactions on Parallel and Distributed Systems | 2017
Weiwen Jiang; Edwin Hsing-Mean Sha; Xianzhang Chen; Lei Yang; Lei Zhou; Qingfeng Zhuge
In high-level synthesis for real-time systems, it typically employs heterogeneous functional-unit types to achieve high-performance and low-cost designs. In the design phase, it is critical to determine which functional-unit type to be mapped for each operation in a given application such that the total cost is minimized while the deadline can be met. For a path or tree structured application, existing approaches can obtain the minimum-cost assignment, called “optimal assignment”, under which the resultant system satisfies a given timing constraint. However, it is still an open question whether there exist efficient algorithms to obtain the optimal assignment for the directed acyclic graph (DAG), or more generally, the data-flow graph with cycles (cyclic DFG). For DAGs, by analyzing the property of the problem, this paper designs an efficient algorithm to obtain the optimal assignments. For cyclic DFGs, we approach this problem with the combination of retiming technique to thoroughly explore the design space. We formulate a Mixed Integer Linear Programming (MILP) model to give the optimal solution. But because of the high degree of its time complexity, we devise a practical algorithm to obtain near-optimal solutions within a minute. Experimental results show the effectiveness of our algorithms. Specifically, compared with existing techniques, we can achieve 25.70 and 30.23 percent reductions in total cost on DAGs and cyclic DFGs, respectively.
embedded software | 2016
Xianzhang Chen; Edwin Hsing-Mean Sha; Weiwen Jiang; Qingfeng Zhuge; Junxi Chen; Jiejie Qin; Yuansong Zeng
Non-Volatile Memory (NVM) is becoming an attractive candidate to be the swap area in embedded systems for its near-DRAM speed, low energy consumption, high density, and byte-addressability. Swapping data from DRAM out to NVM, however, can cause large performance/energy penalty and deplete the lifetime of NVM. Traditional swap mechanisms may need to be re-studied. Even through there are several swap mechanisms proposed for the hybrid DRAM-NVM systems, most of them have limited performance without considering the data access features of applications. In this paper, we analyze the data accesses features of different applications. Then, a swap mechanism, called Refinery Swap, is proposed to improve the performance of the system, reduce energy consumption, and increase the lifetime of NVM simultaneously. Refinery Swap presented two algorithms to exploit the data access features of applications and the characteristics of different kinds of memory medias. The swap operations in the system and the writes upon NVM are reduced using Refinery Swap. Extensive experiments are conducted with standard benchmarks. The experimental results show that the lifetime of NVM for the system with Refinery Swap can be 83 times that of Linux Swap. The performance of the system with Refinery Swap can be 17 times that of DR.Swap, the state-of-the-art swap mechanism for hybrid memory embedded systems. Moreover, the energy consumption of the system achieves 17 times lower.
high performance computing and communications | 2015
Edwin Hsing-Mean Sha; Weiwen Jiang; Qingfeng Zhuge; Lei Yang; Xianzhang Chen
Traditional synchronous systems relied on a global clock to maintain synchronization have incurred problems in worst-case performance and power consumption. A self-timed system that does not depend on a global clock is one of the high-caliber candidates to solve such problems. In this paper, a probabilistic self-timed system model is studied, on which task execution time is represented by a random variable. This paper presents the fundamental properties on time behavior of the probabilistic self-timed system and establishes formulas to calculate its throughput. Then, using the results, efficient algorithms are designed to optimize system throughput and minimize energy consumption. Experimental results show that the throughput of self-timed systems optimized by our algorithms achieves 33.73% improvement compared with that of the optimized synchronous systems. Additionally, the proposed algorithms on minimizing energy can make a good performance-energy tradeoff, achieving 64.36% improvement on energy consumption with little reduction on performance.
Design Automation for Embedded Systems | 2013
Penglin Dai; Qingfeng Zhuge; Xianzhang Chen; Weiwen Jiang; Edwin Hsing-Mean Sha
Hybrid main memory architectures employing both DRAM and non-volatile memories (NVMs) are becoming increasingly attractive due to the opportunities for exploring benefits of various memory technologies, for example, high speed writes on DRAM and low stand-by power consumption on NVMs. File data-block placement (FDP) on different types of page cache is one of the important problems that directly impact the performance and cost of file operations on a hybrid main memory architecture. Page cache is widely used in modern operating systems to expedite file I/O by mapping disk-backed file data-blocks in main memory to process space in virtual memory. In a hybrid main memory, different types of memory with different read/write costs can be allocated as page cache by operating system. In this paper, we study the problem of file data-block placement on different types of page cache to minimize the total cost of file accesses in a program. We propose a dynamic programming algorithm, the FDP Algorithm, to solve the problem optimally for simple programs. We develop an ILP model for the file data-block placement problem for programs composed of multiple regions with data dependencies. An efficient heuristic, the global file data-block placement (GFDP) Algorithm, is proposed to obtain near-optimal solutions for the problem of global file data-block placement on hybrid main memory. Experiments on a set of benchmarks show the effectiveness of the GFDP algorithm compared with a greedy strategy and the ILP. Experimental results show that the GFDP algorithm reduces the total cost of file accesses by
Future Generation Computer Systems | 2018
Lin Wu; Qingfeng Zhuge; Edwin Hsing-Mean Sha; Xianzhang Chen; Linfeng Cheng
languages, compilers, and tools for embedded systems | 2017
Weiwen Jiang; Edwin Hsing-Mean Sha; Qingfeng Zhuge; Hailiang Dong; Xianzhang Chen
51.3~\%
Journal of Computational Science | 2017
Weiwen Jiang; Edwin Hsing-Mean Sha; Xianzhang Chen; Lin Wu; Qingfeng Zhuge
IEEE Access | 2017
Lin Wu; Qingfeng Zhuge; Edwin Hsing-Mean Sha; Xianzhang Chen; Linfeng Cheng
51.3% on average compared with the the greedy strategy.