Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhiliang Qian is active.

Publication


Featured researches published by Zhiliang Qian.


international conference on hardware/software codesign and system synthesis | 2012

A traffic-aware adaptive routing algorithm on a highly reconfigurable network-on-chip architecture

Zhiliang Qian; Paul Bogdan; Guopeng Wei; Chi-Ying Tsui; Radu Marculescu

In this paper, we propose a flexible NoC architecture and a dynamic distributed routing algorithm which can enhance the NoC communication performance with minimal energy overhead. In particular, our proposed NoC architecture exploits the following two features: i) self-reconfigurable bidirectional channels to increase the effective bandwidth and ii) express virtual paths, as well as localized hub routers, to bypass some intermediate nodes at run time in the network. A deadlock-free and traffic-aware dynamic routing algorithm is further developed for the proposed architecture, which can take advantage of the increased flexibility in the proposed architecture. Both the channels self-reconfiguration and routing decisions are made in a distributed fashion, based on a function of the localized traffic conditions, in order to maximize the performance and minimize the energy costs at the macroscopic level. Our simulation results show that the proposed approach can reduce the network latency by 30\% -80\% in most cases compared to a conventional unidirectional mesh topology, while incurring less than 15\% power overhead.


asia and south pacific design automation conference | 2014

A comprehensive and accurate latency model for Network-on-Chip performance analysis

Zhiliang Qian; Da-Cheng Juan; Paul Bogdan; Chi-Ying Tsui; Diana Marculescu; Radu Marculescu

In this work, we propose a new, accurate, and comprehensive analytical model for Network-on-Chip (NoC) performance analysis. Given the application communication graph, the NoC architecture, and the routing algorithm, the proposed framework analyzes the links dependency and then determines the ordering of queuing analysis for performance modeling. The channel waiting times in the links are estimated using a generalized G/G/1/K queuing model, which can tackle bursty traffic and dependent arrival times with general service time distributions. The proposed model is general and can be used to analyze various traffic scenarios for NoC platforms with arbitrary buffer and packet lengths. Experimental results on both synthetic and real applications demonstrate the accuracy and scalability of the newly proposed model.


design, automation, and test in europe | 2013

SVR-NoC: a performance analysis tool for network-on-chips using learning-based support vector regression model

Zhiliang Qian; Da-Cheng Juan; Paul Bogdan; Chi-Ying Tsui; Diana Marculescu; Radu Marculescu

In this work, we propose SVR-NoC, a learning-based support vector regression (SVR) model for evaluating Network-on-Chip (NoC) latency performance. Different from the state-of-the-art NoC analytical model, which uses classical queuing theory to directly compute the average channel waiting time, the proposed SVR-NoC model performs NoC latency analysis based on learning the typical training data. More specifically, we develop a systematic machine-learning framework that uses the kernel-based support vector regression method to predict the channel average waiting time and the traffic flow latency. Experimental results show that SVR-NoC can predict the average packet latency accurately while achieving about 120X speed-up over simulation-based evaluation methods.


asia and south pacific design automation conference | 2011

A thermal-aware application specific routing algorithm for network-on-chip design

Zhiliang Qian; Chi-Ying Tsui

In this work, we propose an application specific routing algorithm to reduce the hot-spot temperature for Network-on-chip (NoC). Using the traffic information of applications, we develop a routing scheme which can achieve a higher adaptivity than the generic ones and at the same time distribute the traffic more uniformly. A set of deadlock-free admissible paths for all the communications is first obtained. To reduce the hot-spot temperature, we find the optimal distribution ratio of the communication traffic among the set of candidate paths. The problem of finding this optimal distribution ratio is formulated as a linear programming (LP) problem and is solved offline. A router microarchitecture which supports our ratio-based selection policy is also proposed. From the simulation results, the peak energy reduction considering the energy consumption of both the processors and routers can be as high as 16.6% for synthetic traffic and real benchmarks.


networks on chips | 2014

An efficient Network-on-Chip (NoC) based multicore platform for hierarchical parallel genetic algorithms

Yuankun Xue; Zhiliang Qian; Guopeng Wei; Paul Bogdan; Chi-Ying Tsui; Radu Marculescu

In this work, we propose a new Network-on-Chip (NoC) architecture for implementing the hierarchical parallel genetic algorithm (HPGA) on a multi-core System-on-Chip (SoC) platform. We first derive the speedup metric of an NoC architecture which directly maps the HPGA onto NoC in order to identify the main sources of performance bottlenecks. Specifically, it is observed that the speedup is mostly affected by the fixed bandwidth that a master processor can use and the low utilization of slave processor cores. Motivated by the theoretical analysis, we propose a new architecture with two multiplexing schemes, namely dynamic injection bandwidth multiplexing (DIBM) and time-division based island multiplexing (TDIM), to improve the speedup and reduce the hardware requirements. Moreover, a task-aware adaptive routing algorithm is designed for the proposed architecture, which can take advantage of the proposed multiplexing schemes to further reduce the hardware overhead. We demonstrate the benefits of our approach using the problem of protein folding prediction, which is a process of importance in biology. Our experimental results show that the proposed NoC architecture achieves up to 240X speedup compared to a single island design. The hardware cost is also reduced by 50% compared to a direct NoC-based HPGA implementation.


design automation conference | 2014

Disease Diagnosis-on-a-Chip: Large Scale Networks-on-Chip based Multicore Platform for Protein Folding Analysis

Yuankun Xue; Zhiliang Qian; Paul Bogdan; Fan Ye; Chi-Ying Tsui

Protein folding is critical for many biological processes. In this work, we propose an NoC-based multi-core platform for protein folding computation. We first identify the speedup bottleneck for applying conventional genetic algorithm on a mesh-based multi-core platform. Then, we address this computation- and communication- intensive problem while taking into account both hardware and software aspects. Specifically, we group the processing cores into islands and propose an NoC-based multicore architecture for intra- and inter-island communication. The high scalability of the proposed platform allows us to integrate from 100 to 1200 cores for the folding computation. We then propose a genetic migration algorithm to take advantage of the massive parallel platform. Our simulation results show that the proposed platform offers near-linear speedup as the number of cores increases. We also report the hardware cost in area and power based on a 100-core FPGA prototype.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016

A Support Vector Regression (SVR)-Based Latency Model for Network-on-Chip (NoC) Architectures

Zhiliang Qian; Da-Cheng Juan; Paul Bogdan; Chi-Ying Tsui; Diana Marculescu; Radu Marculescu

In this paper, we propose SVR-NoC, a network-onchip (NoC) latency model using support vector regression (SVR). More specifically, based on the application communication information and the NoC routing algorithm, the channel and source queue waiting times are first estimated using an analytical queuing model with two equivalent queues. To improve the prediction accuracy, the queuing theory-based delay estimations are included as features in the learning process. We then propose a learning framework that relies on SVR to collect training data and predict the traffic flow latency. The proposed learning methods can be used to analyze various traffic scenarios for the target NoC platform. Experimental results on both synthetic and real-application traffic demonstrate on average less than 12% prediction error in network saturation load, as well as more than 100× speedup compared to cycle-accurate simulations can be achieved.


ACM Transactions on Design Automation of Electronic Systems | 2016

Performance Evaluation of NoC-Based Multicore Systems: From Traffic Analysis to NoC Latency Modeling

Zhiliang Qian; Paul Bogdan; Chi-Ying Tsui; Radu Marculescu

In this survey, we review several approaches for predicting performance of Network-on-Chip (NoC)-based multicore systems, starting from the traffic models to the complex NoC models for latency evaluation. We first review typical traffic models to represent the application workloads in NoC. Specifically, we review Markovian and non-Markovian (e.g., self-similar or long-range memory-dependent) traffic models and discuss their applications on multicore platform design. Then, we review the analytical techniques to predict NoC performance under given input traffic. We investigate analytical models for average as well as maximum delay evaluation. We also review the developments and design challenges of NoC simulators. One interesting research direction in NoC performance evaluation consists of combining simulation and analytical models in order to exploit their advantages together. Toward this end, we discuss several newly proposed approaches that use hardware-based or learning-based techniques. Finally, we summarize several open problems and our perspective to address these challenges.


IEEE Transactions on Very Large Scale Integration Systems | 2015

FSNoC: A Flit-Level Speedup Scheme for Network on-Chips Using Self-Reconfigurable Bidirectional Channels

Zhiliang Qian; Syed Mohsin Abbas; Chi-Ying Tsui

In this paper, we explore optimizing the bandwidth utilization of the network-on-chips (NoCs). We propose a flit-level speedup scheme to improve the NoC performance using self-reconfigurable bidirectional channels. For the NoC intrarouter bandwidth, in addition to allowing flits from different packets to use the idle internal bandwidth of the crossbar, our proposed flit-level speedup scheme also allows flits within the same packet to be transmitted simultaneously. For interrouter channels, a distributed channel configuration scheme is developed to dynamically change the link directions. In this way, the effective bandwidth between two routers can change adaptively depending on the run time network traffic. We present the implementation of the proposed flit-level speedup NoC on a 2-D mesh. An input buffer architecture, which supports reading and writing two flits from the same virtual channel at the same time, is proposed. The switch allocator is also designed to support flit-level parallel arbitration. Extensive simulations on both the synthetic traffic and real applications show performance improvement in throughput and latency over the existing architectures using bidirectional channels.


design, automation, and test in europe | 2012

A flit-level speedup scheme for network-on-chips using self-reconfigurable bi-directional channels

Zhiliang Qian; Ying Fei Teh; Chi-Ying Tsui

In this work, we propose a flit-level speedup scheme to enhance the network-on-chip(NoC) performance utilizing bidirectional channels. In addition to the traditional efforts on allowing flits of different packets using the idling internal and external bandwidth of the bi-directional channel, our proposed flit-level speedup scheme also allows flits within the same packet to be transmitted simultaneously on the bi-directional channel. For inter-router transmission, a novel distributed channel configuration protocol is developed to dynamically control the link directions. For the intra-router transmission, an input buffer architecture which supports reading and writing two flits from the same virtual channel at the same time is proposed. The switch allocator is also designed to support flit-level parallel arbitration. Simulation results on both synthetic traffic and real benchmarks show performance improvement in throughput and latency over the existing architectures using bi-directional channels.

Collaboration


Dive into the Zhiliang Qian's collaboration.

Top Co-Authors

Avatar

Chi-Ying Tsui

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Paul Bogdan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Radu Marculescu

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Da-Cheng Juan

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Diana Marculescu

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Jingyang Zhu

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ying Fei Teh

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guopeng Wei

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Neil Shah

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge