Thomas Canhao Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Canhao Xu is active.

Explore More

Publication

Featured researches published by Thomas Canhao Xu.

Journal of Systems Architecture | 2011

A generic adaptive path-based routing method for MPSoCs

Masoud Daneshtalab; Masoumeh Ebrahimi; Thomas Canhao Xu; Pasi Liljeberg; Hannu Tenhunen

Several unicast routing protocols have been presented for unicast traffic in MPSoCs. Exploiting the unicast routing algorithms for multicast traffic increases the likelihood of deadlock and congestion. In order to avoid deadlock for multicast traffic, the Hamiltonian path strategy was introduced. The traditional Hamiltonian path routing protocols supporting both unicast and multicast traffic are based on deterministic models, leading to lower performance. In this paper, we propose an adaptive routing protocol for both unicast and multicast traffic without using virtual channels. The proposed method maximizes the degree of adaptiveness of the routing functions which are based on the Hamiltonian path while guaranteeing deadlock freedom. Furthermore, both unicast and multicast aspects of the presented method have been widely investigated separately. Results obtained in both synthetic and real traffic models show that the proposed adaptive method for multicast and unicast aspects has lower latency and power dissipation compared to previously proposed path-based multicasting algorithms with negligible hardware overhead.

international conference on electronics and information engineering | 2010

A study of Through Silicon Via impact to 3D Network-on-Chip design

Thomas Canhao Xu; Pasi Liljeberg; Hannu Tenhunen

The adoption of a 3D Network-on-Chip (NoC) design depends on the performance and manufacturing cost of the chip. Therefore, a study of Through Silicon Via (TSV), that connects different layers of a 3D chip, is crucial. In this paper, we analysis the impact of TSV design in 3D NoCs. A 3D NoC with five layers is modeled based on modern 2D chips. We discuss the TSV number required for a 3D NoC. Different placements of half and quarter layer-layer connections are explored. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average network latencies in full and half layer-layer connection are reduced by 5.24% and 2.18% respectively, compared with quarter design. Our analysis and experiment results provide a guideline for designing TSVs in 3D NoCs to leverage the tradeoff between performance and manufacturing cost.

design and diagnostics of electronic circuits and systems | 2011

Optimal number and placement of Through Silicon Vias in 3D Network-on-Chip

Thomas Canhao Xu; Pasi Liljeberg; Hannu Tenhunen

In this paper, we analyze the performance impact of different number of Through Silicon Vias (TSVs) in 3D Network-on-Chip (NoC). The adoption of a 3D NoC design depends on the performance and manufacturing cost of the chip. Therefore, a study of the placement of the TSV, that connects different layers of a 3D chip, is crucial. A 64-core 3D NoC is modeled based on state-of-the-art 2D chips. We discuss the number of TSVs required for a 3D NoC. Different placements of layer-layer connections are explored. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average network latencies in two configurations (full and quarter connection) are reduced by 14.78% and 7.38% respectively, compared with the one-eighth connection design. The improvement of performance is a trade-off of manufacturing cost. Our analysis and experiment results provide a guideline for selecting optimal number of TSVs in 3D NoCs.

Journal of Systems Architecture | 2013

Optimal placement of vertical connections in 3D Network-on-Chip

Thomas Canhao Xu; Gert Schley; Pasi Liljeberg; Martin Radetzki; Juha Plosila; Hannu Tenhunen

Due to technological limitations, manufacturing yield of vertical connections (Through Silicon Vias, TSVs) in 3D Networks-on-Chip (NoC) decreases rapidly when the number of TSVs grows. The adoption of 3D NoC design depends on the performance and manufacturing cost of the chip. This article presents methods for allocating and placing a minimal number of vertical links and the corresponding vertical routers to achieve specified performance goals. A second optimization step allows to maximize redundancy in order to deal with failing TSVs. Globally optimal solutions are determined for the first time for meshes up to 17x17 nodes in size. A 64-core 3D NoC is modeled based on state-of-the-art 2D chips. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, an optimal placement with 25% of vertical connections achieved 81.3% of average network latency and 76.5% of energy delay product, compared with full layer-layer connection. The performance with 12.5% and 6.25% of vertical connections are also evaluated. Our analysis and experiment results provide a guideline for future 3D NoC design.

network and parallel computing | 2009

Explorations of Honeycomb Topologies for Network-on-Chip

Alexander Wei Yin; Thomas Canhao Xu; Pasi Liljeberg; Hannu Tenhunen

Rectangular mesh and torus are the mostly used topologies in network-on-chip (NoC) based systems. In this paper, we quantitatively illustrate that the honeycomb topology is an advantageous design alternative in terms of network cost which is one of the most important parameters that reflects both network performance and implementation cost. Comparing with the rectangular mesh and torus, honeycomb mesh and torus topologies lead to 40% decrease of the network cost. Then we explore the NoC related topological properties of both honeycomb mesh and torus topologies. By transforming the honeycomb topologies into rectangular brick shapes, we demonstrate that the honeycomb topologies are feasible to be implemented with rectangular devices. We also propose a 3D honeycomb topology since 3D IC has become an emerging and promising technique. Another contribution of this paper is the proposal of deadlock free routing algorithms. Based on either the concept of turn model or the logical network, deadlock free routing for all the discussed honeycomb topologies can be achieved.

design and diagnostics of electronic circuits and systems | 2010

Tree-model based mapping for energy-efficient and low-latency Network-on-Chip

Bo Yang; Thomas Canhao Xu; Tero Säntti; Juha Plosila

With the NoC size growing constantly, efficient algorithms are needed to provide power/performance-aware task mapping on massively parallel systems. In this paper a novel tree-model based mapping algorithm is proposed, to achieve high energy efficiency and low latency on NoC platforms. A NoC is abstracted as a tree composed of a root node and median nodes at different levels. By mapping tasks starting from the root of the tree, our algorithm minimizes the communication cost and consequently reduces the energy consumption and network delay. Experimental results show that the run-time of our algorithm is decreased by 90% on average compared to the Greedy Incremental (GI) algorithm. Full system simulation also shows that for Radix traffic, compared to the original random mapping, the GI achieves 18.7% and 17.3% reduction in energy consumption and average network latency respectively, while our algorithm achieves 24.7% and 40.8% reduction respectively.

parallel, distributed and network-based processing | 2014

Mixed-Criticality Run-Time Task Mapping for NoC-Based Many-Core Systems

Mohammad Fattah; Amir-Mohammad Rahmani; Thomas Canhao Xu; Anil Kanduri; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen

Contiguous processor allocation improves both the network and the application performance, by decreasing the congestion probability among communication of different applications. Consequently, the average, standard deviation and worst-case latency of the network is decreased significantly. This makes the contiguous allocation a good solution for time-critical applications with bounded deadlines. On the other hand, non-contiguous allocation will increase the system throughput significantly. Isolated nodes are utilized and more applications can finish their job in a time unit. However, this will lead to poor network metrics, unsuitable for real-time applications. In this work, we combine these two approaches in order to manage workloads with mixed-critical characteristics. Real-time applications are mapped contiguously, while non-critical applications are allowed to get dispersed over the available system nodes. Results show over 50% improvement in worst-case latency and 100 times improvement in deadline misses.

norchip | 2010

Multi-application multi-step mapping method for many-core Network-on-Chips

Bo Yang; Liang Guang; Thomas Canhao Xu; Alexander Wei Yin; Tero Säntti; Juha Plosila

Massive parallel computing performed on many-core Network-on-Chips (NoCs) is the future of the computing. One feasible approach to implement parallel computing is to deploy multiple applications on the NoC simultaneously. In this paper, we propose a multi-application mapping method starting with the application mapping which finds a region on the NoC for each application and then task mapping which maps all tasks of the application into each region. In the application mapping step, several strategies based on the maximal empty rectangle (MER) technique are introduced for finding an optimal region for each application. In the task mapping step, a tree-model based algorithm is used with the purpose of reducing the communication latency and energy consumption. The experiment results show that the proposed method can achieve considerable reduction of network latency and energy consumption (up to 18%) for a given set of applications.

norchip | 2009

A study of 3D Network-on-Chip design for data parallel H.264 coding

Thomas Canhao Xu; Alexander Wei Yin; Pasi Liljeberg; Hannu Tenhunen

In this paper, we study and analyze different Network-on-Chip (NoC) designs for MPEG-4/H.264 coding. The encoding and decoding processes of H.264 have been analyzed. We discuss the parallelism of H.264, and an open-source encoding program is used as a case study. The contribution of this paper lies in the NoC design method and performance evaluation of data parallel H.264 coder. It is shown in our study that the inter-thread data dependency of shared reads and writes are performance bottlenecks. Different non-uniform cache access NoC designs have been explored. Two-dimensional (2D) and three-dimensional (3D) NoCs have been analyzed in terms of hop count and heat dissipation. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average network latencies in two 3D NoC designs are reduced up to 34% compared with the 2D NoC. It is also shown that the heat dissipation is a trade-off consideration in improving the performance of 3D IC. Our analysis and experiment results provide a guideline to design efficient 3D NoCs for data parallel H.264 coding applications.

network on chip architectures | 2012

A high-efficiency low-cost heterogeneous 3D network-on-chip design

Thomas Canhao Xu; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen

In this paper, we propose and analyze a heterogeneous Three Dimensional (3D) Network-on-Chip (NoC) design based on the optimized placement of vertical connections. NoC paradigm is expected to be the solution of future multicore processors, while 3D NoC extends the on-chip network vertically. Most previous research focus on symmetric, homogeneous, fully-connected 3D NoC designs. However, these designs may not be suitable for production and the market. The adoption of a 3D NoC design depends on the performance, power consumption and manufacturing cost of the chip. Here, we propose a 3D NoC design which improves performance, reduces power consumption and manufacturing cost. First, the vertical connections between layers are reduced and placed optimally. Second, the routers and links are redesigned to fit the heterogeneity nature of the network. The 3D NoC design is discussed with two configurations. We model a 64-core 3D NoC based on state-of-the-art 2D NoCs. A cycle accurate full system simulator is used for benchmark results. Experiments show that under different applications, the average execution times in two configurations are reduced by 5.5% and 20.7% respectively, compared with the homogeneous design. The average energy delay product of our design can achieve twice as better comparing with the diagonal heterogeneous design. This paper provides an inspiration for designing high performance, low power consumption and manufacturing cost 3D NoCs.

Explore More