Is this you? Create Your Porfile

Xiaohang Wang

South China University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaohang Wang is active.

Explore More

Publication

Featured researches published by Xiaohang Wang.

IEEE Transactions on Computers | 2016

On Fine-Grained Runtime Power Budgeting for Networks-on-Chip Systems

Xiaohang Wang; Baoxin Zhao; Terrence S. T. Mak; Mei Yang; Yingtao Jiang; Masoud Daneshtalab

Power budgeting is an essential aspect of networks-on-chip (NoC) to meet the power constraint for on-chip communications while assuring the best possible overall system performance. For simplicity and ease of implementation, existing NoC power budgeting schemes treat all the individual routers uniformly when allocating power to them. However, such homogeneous power budgeting schemes ignore the fact that the workloads of different NoC routers may vary significantly, and thus may provide excess power to routers with low workloads, whereas insufficient power to those with high workloads. In this paper, we formulate the NoC power budgeting problem in order to optimize the network performance over a power budget through per-router frequency scaling. We take into account of heterogeneous workloads across different routers as imposed by variations in traffic. Correspondingly, we propose a fine-grained solution using an agile algorithm with low time complexity. Frequency of each router is set individually according to its contribution to the average network latency while meeting the power budget. Experimental results have confirmed that with fairly low runtime and hardware overhead, the proposed scheme can help save up to <inline-formula> <tex-math notation=LaTeX>

parallel, distributed and network-based processing | 2015

DeFrag: Defragmentation for Efficient Runtime Resource Allocation in NoC-Based Many-core Systems

Jim Ng; Xiaohang Wang; Amit Kumar Singh; Terrence S. T. Mak

networks on chips | 2016

Bubble budgeting: throughput optimization for dynamic workloads by exploiting dark cores in many core systems

Xiaohang Wang; Amit Kumar Singh; Bing Li; Yang Yang; Terrence S. T. Mak; Hong Li

</tex-math><alternatives><inline-graphic xlink:type=simple xlink:href=wang-ieq1-2506565.gif/> </alternatives></inline-formula> percent application execution time when compared with the latest proposed methods.

Microprocessors and Microsystems | 2016

A pareto-optimal runtime power budgeting scheme for many-core systems

Xiaohang Wang; Baoxin Zhao; Ling Wang; Terrence S. T. Mak; Mei Yang; Yingtao Jiang; Masoud Daneshtalab

Efficient runtime resource allocation is critical to the overall performance and energy consumption of many-core systems. However, due to the applications unknown arrival and departure time under dynamic workloads, the runtime system resource management is challenging. The frequent allocations and deal locations of the applications might leave on-chip free cores scattered due to the lack of design-time knowledge of their finishing time. This situation is referred to as fragmentation. In order to optimize the performance and energy consumption of the system in such situations, in this paper, we propose a runtime defragmentation approach that collects and reshapes the scattered cores in close proximity. We also propose a fragmentation metric which is able to evaluate the scatteredness of the free cores. Based on this, the proposed algorithm will be executed to bring the scattered free cores together when the metric is over a certain predefined threshold. In this way, the contiguous free core region is formed to facilitate efficient mapping of the incoming applications. Moreover, the proposed algorithm is also aware of the existing applications and minimizes their performance impact. Experimental results demonstrated that the proposed defragmentation approach reduces the overall execution time and energy consumption by 42% and 41%, respectively when compared to some of the existing approaches. Moreover, a negligible overhead, accounting for only less than 2.6% of the overall execution time, is required for the defragmentation process.

IEEE Transactions on Very Large Scale Integration Systems | 2016

Defragmentation for Efficient Runtime Resource Management in NoC-Based Many-Core Systems

Jim Ng; Xiaohang Wang; Amit Kumar Singh; Terrence S. T. Mak

All the cores of a many-core chip cannot be active at the same time, due to reasons like low CPU utilization in server systems and limited power budget in dark silicon era. These free cores (referred to as bubbles) can be placed near active cores for heat dissipation so that the active cores can run at a higher frequency level, boosting the performance of active cores and applications. Budgeting inactive cores (bubbles) to workloads to boost performance has the following three challenges. First, the number of bubbles varies due to dynamic workloads. Second, communication distance increases when a bubble is inserted between two communicating tasks, leading to performance degradation. Third, budgeting too many bubbles as cooler to running applications leads to insufficient cores for future applications. In order to address these challenges, in this paper, a bubble budgeting scheme is proposed to budget free cores to each application so as to optimize the throughput of the whole system, including the execution time of each application and the waiting time incurred for newly arrived applications. Essentially, the proposed algorithm determines the number and locations of bubbles to optimize the performance and waiting time of each application, followed by tasks of each application being mapped to a core region. Experiments show that our approach achieves 50% higher throughput when compared to state-of-the-art thermal-aware runtime task mapping approaches.

IEEE Transactions on Computers | 2016

Adaptive Routing Algorithms for Lifetime Reliability Optimization in Network-on-Chip

Liang Wang; Xiaohang Wang; Terrence S. T. Mak

Due to the ever-escalating power consumption, a significant proportion of the future many-core chips is mandatory to be switched off to meet the power budgets. This trend has brought up a paradigm shift from conventional low-power to power budgeting designs, where performance optimization needs to be performed under a tight power budget constraint. There are two key issues to be considered when moving this new design paradigm forward. Firstly, with per-core frequency scaling, the number of frequency combinations of the cores grows exponentially. As more cores are integrated onto a chip, it becomes more challenging to achieve the optimal performance over a given power budget. Secondly, the power budgets of many-core system might undergo a rapid fluctuation. Consequently, the power budgeting scheme needs to be prompt to make appropriate changes to track such power budget variation. This paper is aiming at resolving the problem of optimizing overall performance over a power budget using frequency scaling technique. To solve the problem efficiently at runtime, we propose a parallel dynamic programming network, in which the Pareto-optimal solutions can be obtained using linear time complexity. Experimental results have confirmed that the proposed approach can reduce the execution time by 45% when compared to other existing methods. The runtime overhead and hardware cost of the proposed approach are reasonably small, such as the average area and power consumption are less than 1% of the whole network-on-chip. This paper demonstrates an effective formulation for delivering Pareto-optimal solutions for power budgeting in future many-core systems.

great lakes symposium on vlsi | 2017

Throughput Optimization for Lifetime Budgeting in Many-Core Systems

Liang Wang; Xiaohang Wang; Ho-fung Leung; Terrence S. T. Mak

Efficient runtime resource allocation is critical to the overall performance and energy consumption of many-core systems. A region of free cores is allocated for each newly launched application. The cores are deallocated when the corresponding applications finish execution. The frequent allocations and deallocations of the cores might leave free cores scattered (not forming a contiguous region). This situation is referred to as fragmentation. Fragmentation could cause the inefficient mapping of the incoming applications, i.e., long communication distance between communicating cores. This further leads to poor performance and high energy consumption. In this paper, we propose a runtime defragmentation scheme that collects and reallocates the scattered cores in close proximity. We first define a fragmentation metric that is able to evaluate the scatteredness level of the free cores. Based on this, the proposed algorithm is executed to bring the scattered free cores together when the fragmentation metric is over a certain predefined threshold. In this way, the contiguous free core region is formed to facilitate the efficient mapping of the incoming applications. Moreover, the proposed algorithm also aims to minimize the negative impact on the performance of existing applications. Experimental results show that the proposed defragmentation scheme reduces the overall execution time and the energy consumption by 42% and 41%, respectively, when it is augmented to existing runtime mapping algorithms. Moreover, a negligible overhead, accounting for only less than 2.6% of the overall execution time, is required for the proposed defragmentation process. The proposed defragmentation scheme is an effective resource management enhancement to existing runtime mapping algorithms for many-core systems.

Microprocessors and Microsystems | 2016

On runtime adaptive tile defragmentation for resource management in many-core systems☆

Xiaohang Wang; Ting Fei; Boquan Zhang; Terrence S. T. Mak

Technology scaling leads to the reliability issue as a primary concern in Network-on-Chip (NoC) design. We observe that due to routing algorithm some routers age much faster than others which becomes a bottleneck for NoC lifetime. In this paper, lifetime is modeled as a resource consumed over time. A metric lifetime budget is associated with each router, indicating the maximum allowed workload for current period. Since the heterogeneity in router lifetime reliability has strong correlation with the routing algorithm, we define a problem to optimize the lifetime by routing packets along the path with maximum lifetime budgets. The problem is then extended for both performance and lifetime reliability optimization. The lifetime is optimized in long-term time scale while performance is optimized in short-term time scale. Two dynamic programming-based adaptive routing algorithms (lifetime aware routing and multi-objective routing) are proposed to solve the problems. In the experiments, the lifetime aware routing and multi-objective routing algorithms are evaluated with synthetic traffic and real benchmarks respectively. The experimental results show that the lifetime aware routing has around 20, 45 and 55 percent minimal lifetime improvement than XY routing, NoP routing and Oddeven routing, respectively. In addition, the multi-objective adaptive routing algorithm can effectively improve both performance and lifetime.

Journal of Systems Architecture | 2018

Effectiveness of HT-assisted sinkhole and blackhole denial of service attacks targeting mesh networks-on-chip

Li Zhang; Xiaohang Wang; Yingtao Jiang; Mei Yang; Terrence S. T. Mak; Amit Kumar Singh

Due to technology scaling, lifetime reliability is becoming one of major design constraints in the design of future many-core systems. In this paper, we propose a novel runtime mapping scheme which could dynamically map the applications given a lifetime reliability constraint. A borrowing strategy is adopted to manage the lifetime in a long-term scale, and the lifetime constraint could be relaxed in short-term scale when the communication performance requirement is high. The throughput could be improved because the communication performance of communication intensive applications is optimized, and meanwhile the waiting time of computation intensive application is reduced. Furthermore, an improved neighborhood allocation method is proposed for the runtime mapping scheme. The experimental results show that compared to the state-of-the-art lifetime-constrained mapping, the proposed mapping scheme could have over 20% throughput improvement.

IEEE Transactions on Computers | 2018

Bubble Budgeting: Throughput Optimization for Dynamic Workloads by Exploiting Dark Cores in Many Core Systems

Xiaohang Wang; Amit Kumar Singh; Bing Li; Yang Yang; Hong Li; Terrence S. T. Mak

Before an application can be actually launched in a many-core system, the first thing that needs to be done is to get the application mapped to a number of tiles (cores). Such online application mapping process may unfortunately lead to a serious resource leak problem, referred as tile fragmentation, that free (uncommitted) tiles from any single contiguous region are just inadequate to accommodate the performance needs of an incoming application, although the total number of free tiles may still exceed what is required to service this application. When applications have to be mapped to noncontiguous tiles due to fragmentation, there will be obvious performance penalty due to increased communication distances. As a result, defragmentation that consolidates fragmented tiles needs to be routinely exercised, and this defragmentation process must not introduce high computation overhead that otherwise can adversely impact the system performance. In this paper, we propose a task migration-based adaptive tile defragmentation algorithm that helps consolidate running applications through online task migration. This algorithm relocates the applications’ tile regions so that a contiguous free tile region is formed and maintained. By doing so, future applications can be mapped to a region with low communication distance. Both the computation overhead and quality of defragmentation result of the proposed algorithm are adaptively set in response to the system workloads. Enabled by its low overhead, the proposed defragmentation algorithm is an effective resource management enhancement to the existing runtime task-to-tile mapping methods, with as much as 3× system throughput improvement observed in some experiments.

Explore More