Is this you? Create Your Porfile

Jigang Wu

Guangdong University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jigang Wu is active.

Explore More

Publication

Featured researches published by Jigang Wu.

parallel computing | 2013

Efficient reconfiguration algorithms for communication-aware three-dimensional processor arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

Abstract Homogeneous processor arrays are emerging in tera-scale computation and effective fault tolerance techniques are essential to improving the reliability of such complex integrated circuits. We study the degradable processor arrays to achieve fault tolerance by employing reconfiguration. Three bypass schemes and three rerouting schemes are proposed to reconfigure three-dimensional processor arrays with defective processors to achieve target arrays without faults. A heuristic algorithm is proposed to construct a target array on the selected rows and columns. It is also proved that the proposed greedy plane rerouting algorithm (GPR) produces maximum target array. In addition, the problem of constructing the communication efficient array is considered in this paper. An algorithm is proposed to refine the communication among processors within the target array constructed by GPR. Experimental study shows that the proposed algorithm GPR produces target arrays with higher harvest and lower degradation on the host arrays with fault density no more than 5%. In addition, the communication performance is significantly optimized by reducing the number of long interconnects, and the average improvement is about 34% for all cases considered in this paper.

high performance computing and communications | 2013

An Efficient Topology Reconfiguration Algorithm for NoC Based Multiprocessor Arrays

Chao Wang; Jigang Wu; Guiyuan Jiang; Jizhou Sun

To realize the reliability of a high-performance multiprocessor system with a reconfigurable interconnect, there is a need to compute a interconnect topology that will allow for a high-throughput load distribution on top of the physical processor array. In this paper, we investigate the problem of topology reconfiguration for Network on Chip (NoC) based multiprocessor arrays with faulty processing elements (PEs). We propose two types of shift operations, i.e. row bi-shift operation and column shift operation, for redistributing fault-free PEs of a processor array in reconfiguration. We solve the topology reconfiguration problem by developing two efficient algorithms. The first algorithm, denoted as CRS, is able to generate a logical topology of desirable communication performance by alternately performing the two shift operations. The second algorithm revises the initial topology produced by CRS to further improve the communication performance, using tabu search techniques. Experimental results validate the efficiency of the proposed algorithm in comparison to previous approaches. For 16*16 physical arrays with 30% faulty PEs, the proposed approaches improve existing algorithms up to 39% in terms of message latency and congestion.

international conference on parallel and distributed systems | 2012

Non-Backtracking Reconfiguration Algorithm for Three-dimensional VLSI Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

Fast reconfiguration is one of the main challenges in fault tolerant VLSI arrays. In these arrays, there are some invalid processing elements (PEs) that are fault-free but cannot be used to form a target array. These invalid PEs lead to backtracking in reconfiguration. This paper proposes a non-backtracking reconfiguration (NBR) algorithm for three-dimensional degradable VLSI array with faults. The proposed algorithm accelerates the reconfiguration without loss of harvest, by eliminating the backtracking operation that frequently occurs in the existing algorithm (named as BGPR) cited in this paper. Initially, the invalid PEs are identified in the preprocessing for the host array. Then NBR algorithm constructs each logical plane from bottom to top in the host array, and updates the set of the invalid PEs in the host array after a logical plane is constructed. Experimental results show that the NBR algorithm is more scalable than the BGPR algorithm, and thus it can reconfigure large host arrays much faster. In addition, the runtime of NBR algorithm tends to decrease, rather than increase as did in BGPR algorithm, with the increasing fault density.

IEEE ACM Transactions on Networking | 2017

Joint Charging Tour Planning and Depot Positioning for Wireless Sensor Networks Using Mobile Chargers

Guiyuan Jiang; Siew Kei Lam; Yidan Sun; Lijia Tu; Jigang Wu

Recent breakthrough in wireless energy transfer technology has enabled wireless sensor networks (WSNs) to operate with zero-downtime through the use of mobile energy chargers (MCs), that periodically replenish the energy supply of the sensor nodes. Due to the limited battery capacity of the MCs, a significant number of MCs and charging depots are required to guarantee perpetual operations in large scale networks. Existing methods for reducing the number of MCs and charging depots treat the charging tour planning and depot positioning problems separately even though they are inter-dependent. This paper is the first to jointly consider charging tour planning and MC depot positioning for large-scale WSNs. The proposed method solves the problem through the following three stages: charging tour planning, candidate depot identification and reduction, and depot deployment and charging tour assignment. The proposed charging scheme also considers the association between the MC charging cycle and the operational lifetime of the sensor nodes, in order to maximize the energy efficiency of the MCs. This overcomes the limitations of existing approaches, wherein MCs with small battery capacity ends up charging sensor nodes more frequently than necessary, while MCs with large battery capacity return to the depots to replenish themselves before they have fully transferred their energy to the sensor nodes. Compared with existing approaches, the proposed method leads to an average reduction in the number of MCs by 64%, and an average increase of 19.7 times on the ratio of total charging time over total traveling time.

Computers & Electrical Engineering | 2016

Algorithms for bi-objective multiple-choice hardware/software partitioning

Wenjun Shi; Jigang Wu; Siew Kei Lam; Thambipillai Srikanthan

This paper proposes three algorithms for multiple-choice hardware-software partitioning with the objectives of minimizing execution time and power consumption, while meeting area constraint. Firstly, a heuristic algorithm is proposed to rapidly generate an approximate solution. In the second algorithm we refined the approximate solution using a customized tabu search algorithm. Finally, a dynamic programming algorithm is proposed to calculate the exact solution. Simulation results show that the approximate solution is very close to the exact solution. This can be further refined by tabu search to achieve a solution with less than 1.5% error for all cases considered in this paper.

international symposium on parallel architectures algorithms and programming | 2014

Algorithmic Aspects for Bi-Objective Multiple-Choice Hardware/Software Partitioning

Wenjun Shi; Jigang Wu; Siew Kei Lam; Thambipillai Srikanthan

Designing embedded systems has become a challengingprocess due to the increasing complexity of the applications. In addition, there is a need to meet multiple conflicting constraints such as speed, power and cost. These factors have led to an explosion in the design space as each task in the application can have various implementation options (software and a range of hardware customizations), where each implementation option is associated with different speed, power and cost. In this paper, we propose hardware-software (HW/SW) partitioning algorithms that are capable of managing the large design space by taking into account the multiple implementation choices. In particular, we focus on multiple-choice HW/SW partitioning with the following objectives: minimizing execution time and power consumption, while meeting the area constraint. Two algorithms will be presented: 1) a heuristic method that is based on the bi-objective knapsack problem to rapidly generate an approximate solution, 2) a dynamic programming algorithm to calculate the exact solution. Simulation results show that the heuristic method produces results that are very close to the exact ones.

ieee acm international symposium cluster cloud and grid computing | 2017

DOTA: Delay Bounded Optimal Cloudlet Deployment and User Association in WMANs

Longjie Ma; Jigang Wu; Long Chen

In the large-scale Wireless Metropolitan Area Network (WMAN) consisting of many wireless Access Points (APs),choosing the appropriate position to place cloudlet is very important for reducing the users access delay. For service provider, it isalways very costly to deployment cloudlets. How many cloudletsshould be placed in a WMAN and how much resource eachcloudlet should have is very important for the service provider. In this paper, we study the cloudlet placement and resourceallocation problem in a large-scale Wireless WMAN, we formulatethe problem as an novel cloudlet placement problem that givenan average access delay between mobile users and the cloudlets, place K cloudlets to some strategic locations in the WMAN withthe objective to minimize the number of use cloudlet K. Wethen propose an exact solution to the problem by formulatingit as an Integer Linear Programming (ILP). Due to the poorscalability of the ILP, we devise a clustering algorithm K-Medoids(KM) for the problem. For a special case of the problem whereall cloudlets computing capabilities have been given, we proposean efficient heuristic for it. We finally evaluate the performanceof the proposed algorithms through experimental simulations. Simulation result demonstrates that the proposed algorithms areeffective.

network and parallel computing | 2013

Efficiency of Flexible Rerouting Scheme for Maximizing Logical Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun

In a multiprocessor array, some processing elements PEs fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant reconfiguration considered in this paper is to reorganize fault-free PEs from a processor array with faults to a logical array of regular mesh topology by changing the interconnections among PEs. This paper presents the efficiency of the flexible rerouting scheme to maximize the usage of the fault-free PEs, by developing an efficient reconfiguration algorithm without backtracking. The proposed algorithm constructs each logical columns from left to right on candidate PE sets. It updates the candidate sets by excluding the PEs which cannot be used, once a logical column is formed. Also, it is proved that the proposed heuristic algorithm is able to generate the maximum-size logical array in linear time. Experimental results show that 123 logical columns can be constructed on 256 ×256 host arrays with fault density of 30%, resulting in an improvement of 51% in comparison to the previous algorithm by which only 82 logical columns can be produced. Furthermore, our algorithm is able to generate target arrays with harvest over 56% on host arrays with fault density of 50%, while the previous work cited in this paper fails to construct any target array in this case.

The Journal of Supercomputing | 2015

Algorithmic aspects of graph reduction for hardware/software partitioning

Guiyuan Jiang; Jigang Wu; Siew Kei Lam; Thambipillai Srikanthan; Jizhou Sun

The hardware/software (HW/SW) partitioning is a major concern in heterogeneous multi-processor system-on-a-chip design, where the large design space prohibits rapid identification of optimal HW/SW solutions to meet tight time-to-market constraints. In this paper, we propose graph reduction techniques to reduce the design space for HW/SW partitioning without sacrificing the partition quality. There are two major phases in the proposed approach: reducible sub-graph searching and sub-graph evaluation and reduction. In the former phase, we design a dynamic programming-based algorithm, named path flow algorithm (PFA), to identify reducible sub-graph candidates for directed acyclic graph (DAG) as most previous works use DAG as task graph model. We also propose algorithm DeLoop to transform an arbitrary directed graph into a DAG such that all reducible sub-graphs on the original graph can be detected by performing algorithm PFA on the DAG. Our approach overcomes the limitation of the existing approach by enabling the identification of candidate sub-graphs in arbitrary task graphs. In latter phase, we propose a reduction model which enables accurate estimation of task execution time on hardware and design a method to select candidate sub-graphs for reduction. Experimental results demonstrate that the proposed methods not only reduce the design space, but also notably improve the partitioning quality since hardware-parallel execution of tasks is taken into account in the proposed sub-graph reduction model.

international conference on algorithms and architectures for parallel processing | 2014

Reducing the Interconnection Length for 3D Fault-Tolerant Processor Arrays

Guiyuan Jiang; Jigang Wu; Jizhou Sun; Longting Zhu

The three-dimensional (3D) processor array has benefits of reducing interconnection latency, consuming less power and improving bandwidths compared to 2D processor arrays. However, it suffers from frequent faults due to power overheating during massively parallel computing. To achieve fault-tolerance under such a such a scenario, an effective method is to construct a non-faulty sub-array from the faulty array as large as possible, such that the original application can still work on the sub-array. However, logical sub-arrays produced by previous works contain large number of long interconnects, which leads to more communication cost, capacitance and dynamic power dissipation. In this paper,we investigate the problem of reducing the interconnection length of a logical array. First, we prove that it is a NP-hard problem. Then we propose an efficient heuristic to reduce the interconnection redundancy of a logical array by reducing the number of long interconnects in each logical plane. Each logical plane is optimized based on statistical information. Experimental results show that, on 32×32×32 host array with fault densities ranging from 0.1% to 5%, the proposed algorithm is capable of reducing the interconnection length by 49.7% and 29.8% in average compared to the existing algorithm GPR and CAR, respectively.

Explore More