Guiyuan Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guiyuan Jiang is active.

Explore More

Publication

Featured researches published by Guiyuan Jiang.

IEEE Transactions on Parallel and Distributed Systems | 2014

Constructing Sub-Arrays with ShortInterconnects from Degradable VLSI Arrays

Wu Jigang; Thambipillai Srikanthan; Guiyuan Jiang; Kai Wang

Reducing the interconnection length of VLSI arrays leads to less capacitance, power dissipation and dynamic communication cost between the processing elements (PEs). This paper develops efficient algorithms for constructing tightly-coupled subarrays from the mesh-connected VLSI arrays with faulty PEs. For a given size r·s of the target (logical) array, the proposed algorithm searches and reroutes a physical r×s subarray that has the least number of faults, resulting in an approximate target array, which is subsequently extended to the desired target array. Experimental results show that over 65 percent redundant interconnects can be reduced for a 64×64 target array on the 512×512 host array with no more than 1 percent faults. In addition, we propose a recursive divide-and-conquer algorithm for constructing the maximum target array (MTA). The lower bound of the total interconnection length of the MTA has been established. Experimental results show that the proposed algorithm is capable of reducing the long interconnects by over 33 percent for the MTA derived from the 512×512 host array with no more than 1 percent faults. Moreover, the proposed total interconnection length of target array is close to the lower bound for the cases with relatively fewer number of faults.

parallel computing | 2013

Efficient reconfiguration algorithms for communication-aware three-dimensional processor arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

Abstract Homogeneous processor arrays are emerging in tera-scale computation and effective fault tolerance techniques are essential to improving the reliability of such complex integrated circuits. We study the degradable processor arrays to achieve fault tolerance by employing reconfiguration. Three bypass schemes and three rerouting schemes are proposed to reconfigure three-dimensional processor arrays with defective processors to achieve target arrays without faults. A heuristic algorithm is proposed to construct a target array on the selected rows and columns. It is also proved that the proposed greedy plane rerouting algorithm (GPR) produces maximum target array. In addition, the problem of constructing the communication efficient array is considered in this paper. An algorithm is proposed to refine the communication among processors within the target array constructed by GPR. Experimental study shows that the proposed algorithm GPR produces target arrays with higher harvest and lower degradation on the host arrays with fault density no more than 5%. In addition, the communication performance is significantly optimized by reducing the number of long interconnects, and the average improvement is about 34% for all cases considered in this paper.

high performance computing and communications | 2013

An Efficient Topology Reconfiguration Algorithm for NoC Based Multiprocessor Arrays

Chao Wang; Jigang Wu; Guiyuan Jiang; Jizhou Sun

To realize the reliability of a high-performance multiprocessor system with a reconfigurable interconnect, there is a need to compute a interconnect topology that will allow for a high-throughput load distribution on top of the physical processor array. In this paper, we investigate the problem of topology reconfiguration for Network on Chip (NoC) based multiprocessor arrays with faulty processing elements (PEs). We propose two types of shift operations, i.e. row bi-shift operation and column shift operation, for redistributing fault-free PEs of a processor array in reconfiguration. We solve the topology reconfiguration problem by developing two efficient algorithms. The first algorithm, denoted as CRS, is able to generate a logical topology of desirable communication performance by alternately performing the two shift operations. The second algorithm revises the initial topology produced by CRS to further improve the communication performance, using tabu search techniques. Experimental results validate the efficiency of the proposed algorithm in comparison to previous approaches. For 16*16 physical arrays with 30% faulty PEs, the proposed approaches improve existing algorithms up to 39% in terms of message latency and congestion.

international conference on parallel and distributed systems | 2012

Non-Backtracking Reconfiguration Algorithm for Three-dimensional VLSI Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

Fast reconfiguration is one of the main challenges in fault tolerant VLSI arrays. In these arrays, there are some invalid processing elements (PEs) that are fault-free but cannot be used to form a target array. These invalid PEs lead to backtracking in reconfiguration. This paper proposes a non-backtracking reconfiguration (NBR) algorithm for three-dimensional degradable VLSI array with faults. The proposed algorithm accelerates the reconfiguration without loss of harvest, by eliminating the backtracking operation that frequently occurs in the existing algorithm (named as BGPR) cited in this paper. Initially, the invalid PEs are identified in the preprocessing for the host array. Then NBR algorithm constructs each logical plane from bottom to top in the host array, and updates the set of the invalid PEs in the host array after a logical plane is constructed. Experimental results show that the NBR algorithm is more scalable than the BGPR algorithm, and thus it can reconfigure large host arrays much faster. In addition, the runtime of NBR algorithm tends to decrease, rather than increase as did in BGPR algorithm, with the increasing fault density.

IEEE ACM Transactions on Networking | 2017

Joint Charging Tour Planning and Depot Positioning for Wireless Sensor Networks Using Mobile Chargers

Guiyuan Jiang; Siew Kei Lam; Yidan Sun; Lijia Tu; Jigang Wu

Recent breakthrough in wireless energy transfer technology has enabled wireless sensor networks (WSNs) to operate with zero-downtime through the use of mobile energy chargers (MCs), that periodically replenish the energy supply of the sensor nodes. Due to the limited battery capacity of the MCs, a significant number of MCs and charging depots are required to guarantee perpetual operations in large scale networks. Existing methods for reducing the number of MCs and charging depots treat the charging tour planning and depot positioning problems separately even though they are inter-dependent. This paper is the first to jointly consider charging tour planning and MC depot positioning for large-scale WSNs. The proposed method solves the problem through the following three stages: charging tour planning, candidate depot identification and reduction, and depot deployment and charging tour assignment. The proposed charging scheme also considers the association between the MC charging cycle and the operational lifetime of the sensor nodes, in order to maximize the energy efficiency of the MCs. This overcomes the limitations of existing approaches, wherein MCs with small battery capacity ends up charging sensor nodes more frequently than necessary, while MCs with large battery capacity return to the depots to replenish themselves before they have fully transferred their energy to the sensor nodes. Compared with existing approaches, the proposed method leads to an average reduction in the number of MCs by 64%, and an average increase of 19.7 times on the ratio of total charging time over total traveling time.

network and parallel computing | 2013

Efficiency of Flexible Rerouting Scheme for Maximizing Logical Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

In a multiprocessor array, some processing elements PEs fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant reconfiguration considered in this paper is to reorganize fault-free PEs from a processor array with faults to a logical array of regular mesh topology by changing the interconnections among PEs. This paper presents the efficiency of the flexible rerouting scheme to maximize the usage of the fault-free PEs, by developing an efficient reconfiguration algorithm without backtracking. The proposed algorithm constructs each logical columns from left to right on candidate PE sets. It updates the candidate sets by excluding the PEs which cannot be used, once a logical column is formed. Also, it is proved that the proposed heuristic algorithm is able to generate the maximum-size logical array in linear time. Experimental results show that 123 logical columns can be constructed on 256 ×256 host arrays with fault density of 30%, resulting in an improvement of 51% in comparison to the previous algorithm by which only 82 logical columns can be produced. Furthermore, our algorithm is able to generate target arrays with harvest over 56% on host arrays with fault density of 50%, while the previous work cited in this paper fails to construct any target array in this case.

The Journal of Supercomputing | 2015

Algorithmic aspects of graph reduction for hardware/software partitioning

Guiyuan Jiang; Jigang Wu; Siew Kei Lam; Thambipillai Srikanthan; Jizhou Sun

The hardware/software (HW/SW) partitioning is a major concern in heterogeneous multi-processor system-on-a-chip design, where the large design space prohibits rapid identification of optimal HW/SW solutions to meet tight time-to-market constraints. In this paper, we propose graph reduction techniques to reduce the design space for HW/SW partitioning without sacrificing the partition quality. There are two major phases in the proposed approach: reducible sub-graph searching and sub-graph evaluation and reduction. In the former phase, we design a dynamic programming-based algorithm, named path flow algorithm (PFA), to identify reducible sub-graph candidates for directed acyclic graph (DAG) as most previous works use DAG as task graph model. We also propose algorithm DeLoop to transform an arbitrary directed graph into a DAG such that all reducible sub-graphs on the original graph can be detected by performing algorithm PFA on the DAG. Our approach overcomes the limitation of the existing approach by enabling the identification of candidate sub-graphs in arbitrary task graphs. In latter phase, we propose a reduction model which enables accurate estimation of task execution time on hardware and design a method to select candidate sub-graphs for reduction. Experimental results demonstrate that the proposed methods not only reduce the design space, but also notably improve the partitioning quality since hardware-parallel execution of tasks is taken into account in the proposed sub-graph reduction model.

international conference on algorithms and architectures for parallel processing | 2014

Reducing the Interconnection Length for 3D Fault-Tolerant Processor Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun; Longting Zhu

The three-dimensional (3D) processor array has benefits of reducing interconnection latency, consuming less power and improving bandwidths compared to 2D processor arrays. However, it suffers from frequent faults due to power overheating during massively parallel computing. To achieve fault-tolerance under such a such a scenario, an effective method is to construct a non-faulty sub-array from the faulty array as large as possible, such that the original application can still work on the sub-array. However, logical sub-arrays produced by previous works contain large number of long interconnects, which leads to more communication cost, capacitance and dynamic power dissipation. In this paper,we investigate the problem of reducing the interconnection length of a logical array. First, we prove that it is a NP-hard problem. Then we propose an efficient heuristic to reduce the interconnection redundancy of a logical array by reducing the number of long interconnects in each logical plane. Each logical plane is optimized based on statistical information. Experimental results show that, on 32×32×32 host array with fault densities ranging from 0.1% to 5%, the proposed algorithm is capable of reducing the interconnection length by 49.7% and 29.8% in average compared to the existing algorithm GPR and CAR, respectively.

The Journal of Supercomputing | 2014

Parallel reconfiguration algorithms for mesh-connected processor arrays

Jigang Wu; Guiyuan Jiang; Yuze Shen; Siew Kei Lam; Jizhou Sun; Thambipillai Srikanthan

Effective fault tolerance techniques are essential for improving the reliability of multiprocessor systems. At the same time, fault tolerance must be achieved at high speed to meet the real-time constraints of embedded systems. While parallelism has often been exploited to increase performance, to the best of our knowledge, there has been no previously reported work on parallel reconfiguration of mesh-connected processor arrays with faults. This paper presents two parallel algorithms to accelerate reconfiguration of the processor arrays. The first algorithm reconfigures a host array in parallel in a multithreading manner. The threads in the parallel algorithm execute independently within a safe rerouting distance. The second algorithm is based on a divide-and-conquer approach to first generate the leftmost segments in parallel and then merge the segments in parallel. When compared to the conventional algorithm, simulation results from a large number of instances confirm that the proposed algorithms significantly accelerate the reconfiguration without loss of harvest.

Journal of Parallel and Distributed Computing | 2014

Flexible rerouting schemes for reconfiguration of multiprocessor arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun; Yiyi Gao

Abstract In a multiprocessor array, some processing elements (PEs) fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant reconfiguration reorganizes fault-free PEs to a new regular topology by changing the interconnection among PEs. This paper investigates the problem of constructing as large as possible logical array with short interconnects from a physical array with faults. A flexible rerouting scheme is developed to improve the efficiency of utilizing fault-free PEs. Under the scheme, two efficient reconfiguration algorithms are proposed. The first algorithm is able to generate the maximum logical array (MLA) in linear time. The second algorithm reduces the interconnect length of the MLA, and it is capable of producing nearly optimal logical arrays in comparison to the lower bound of the interconnect length, that is also proposed in this paper. Experimental results validate the efficiency of the flexible rerouting schemes and the proposed algorithms. For 128×128 host arrays with 30% unavailable PEs, the proposed approaches improve existing algorithm up to 44% in terms of logical array size, while reducing the interconnection redundancy by 49.6%. In addition, the proposed algorithms are more scalable than existing approaches. On host arrays with 50% unavailable PEs, our algorithms can produce logical arrays with harvest over 56% while existing approaches fail to construct a feasible logical array.

Explore More