Zhonghai Lu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhonghai Lu is active.

Explore More

Publication

Featured researches published by Zhonghai Lu.

field-programmable logic and applications | 2009

Run-time Partial Reconfiguration speed investigation and architectural design space exploration

Ming Liu; Wolfgang Kuehn; Zhonghai Lu; Axel Jantsch

Run-time Partial Reconfiguration (PR) speed is significant in applications especially when fast IP core switching is required. In this paper, we propose to use Direct Memory Access (DMA), Master (MST) burst, and a dedicated Block RAM (BRAM) cache respectively to reduce the reconfiguration time. Based on the Xilinx PR technology and the Internal Configuration Access Port (ICAP) primitive in the FPGA fabric, we discuss multiple design architectures and thoroughly investigate their performance with measurements for different partial bitstream sizes. Compared to the reference OPB HWICAP and XPS HWICAP designs, experimental results showthatDMA HWICAP and MST HWICAP reduce the reconfiguration time by one order of magnitude, with little resource consumption overhead. The BRAM HWICAP design can even approach the reconfiguration speed limit of the ICAP primitive at the cost of large Block RAM utilization.

network on chip architectures | 2010

A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for network-on-chip

Chaochao Feng; Zhonghai Lu; Axel Jantsch; Jinwen Li; Minxuan Zhang

We propose a reconfigurable fault-tolerant deflection routing algorithm (FTDR) based on reinforcement learning for NoC. The algorithm reconfigures the routing table through a kind of reinforcement learning---Q-learning using 2-hop fault information. It is topology-agnostic and insensitive to the shape of the fault region. In order to reduce the routing table size, we also propose a hierarchical Q-learning based deflection routing algorithm (FTDR-H) with area reduction up to 27% for a switch in an 8 x 8 mesh compared to the original FTDR. Experimental results show that in the presence of faults, FTDR and FTDR-H are better than other fault-tolerant deflection routing algorithms and a turn model based fault-tolerant routing algorithm.

IEEE Transactions on Very Large Scale Integration Systems | 2008

TDM Virtual-Circuit Configuration for Network-on-Chip

Zhonghai Lu; Axel Jantsch

In network-on-chip (NoC), time-division-multiplexing (TDM) virtual circuits (VCs) have been proposed to satisfy the quality-of-service requirements of applications. TDM VC is a connection-oriented communication service by which two or more connections take turns to share buffers and link bandwidth using dedicated time slots. In the paper, we first give a formulation of the multinode VC configuration problem for arbitrary NoC topologies. A multinode VC allows multiple source and destination nodes on it. Then we address the two problems of path selection and slot allocation for TDM VC configuration. For the path selection, we use a backtracking algorithm to explore the path diversity, constructively searching the solution space. In the slot allocation phase, overlapped VCs must be configured such that no conflict occurs and their bandwidth requirements are satisfied. We define the concept of a logical network (LN) as an infinite set of associated (time slot, buffer) pairs with respect to a buffer on a given VC. Based on this concept, we develop and prove theorems that constitute sufficient and necessary conditions to establish conflict-free VCs. They are applicable for networks where all nodes operate with the same clock frequency but allowing different phases. Using these theorems, slot allocation for VCs is a procedure of assigning VCs to different LNs. TDM VC configuration can thus be predictable and correct-by-construction. Our experiments on synthetic and real applications validate the effectiveness and efficiency of our approach.

ieee computer society annual symposium on vlsi | 2006

Connection-oriented multicasting in wormhole-switched networks on chip

Zhonghai Lu; Bei Yin; Axel Jantsch

Network-on-chip (NoC) proposes networks to replace buses as a scalable global communication interconnect for future SoC designs. However, a bus is very efficient in broadcasting. As the system size scales up to explore the chip capacity, broadcasting in NoCs must be efficiently supported. This paper presents a novel multicast scheme in wormhole-switched NoCs. By this scheme, a multicast procedure consists of establishment, communication and release phase. A multicast group can request to reserve virtual channels during establishment and has priority on arbitration of link bandwidth. This multicasting method has been effectively implemented in a mesh network with deadlock freedom. Our experiments show that the multicast technique improves throughput, and does not exhibit significant impact on unicast performance in a network with mixed unicast and multicast traffic if the network is not saturated

networks on chips | 2009

Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip

Yue Qian; Zhonghai Lu; Wenhua Dou

In packet-switched network-on-chip, computing worst-case delay bounds is crucial for designing predictable and cost-effective communication systems but yet an intractable problem due to complicated resource sharing scenarios. For wormhole networks with credit-based flow control, the existence of cyclic dependency between flit delivery and credit generation further complicates the problem. Based on network calculus, we propose a technique for analyzing communication delay bounds for individual flows in wormhole networks. We first propose router service analysis models for flow control, link and buffer sharing. Based on these analysis models, we obtain a buffering-sharing analysis network, which is open-ended and captures both flow control and link sharing. Furthermore, we compute equivalent service curves for individual flows using the network contention tree model in the buffer-sharing analysis network, and then derive their delay bounds. Our experimental results verify that the theoretical bounds are correct and tight.

great lakes symposium on vlsi | 2006

Evaluation of on-chip networks using deflection routing

Zhonghai Lu; Mingchen Zhong; Axel Jantsch

Deflection routing is being proposed for networks on chips since it is simple and adaptive. A deflection switch can be much smaller and faster than a wormhole or virtual cut-through switch. A deflection-routed network has three orthogonal characteristics: topology, routing algorithm and deflection policy. In this paper we evaluate deflection networks with different topologies such as mesh, torus and Manhattan Street Network, different routing algorithms such as random, dimension XY, delta XY and minimum deflection, as well as different deflection policies such as non-priority, weighted priority and straight-through policies. Our results suggest that the performance of a deflection network is more sensitive to its topology than the other two parameters. It is less sensitive to its routing algorithm, but a routing algorithm should be minimal. A priority-based deflection policy that uses global and history-related criterion can achieve both better average-case and worst-case performance than a non-priority or priority policy that uses local and stateless criterion. These findings are important since they can guide designers to make right decisions on the deflection network architecture, for instance, selecting a routing algorithm or deflection policy which has potentially low cost and high speed for hardware implementation.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router

Chaochao Feng; Zhonghai Lu; Axel Jantsch; Minxuan Zhang; Zuocheng Xing

Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 88 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8 8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2010

Analysis of Worst-Case Delay Bounds for On-Chip Packet-Switching Networks

Yue Qian; Zhonghai Lu; Wenhua Dou

In network-on-chip (NoC), computing worst-case delay bounds for packet delivery is crucial for designing predictable systems but yet an intractable problem. This paper presents an analysis technique to derive per-flow communication delay bound. Based on a network contention model, this technique, which is topology independent, employs network calculus to first compute the equivalent service curve for an individual flow and then calculate its packet delay bound. To exemplify this method, this paper also presents the derivation of a closed-form formula to compute a flows delay bound under all-to-one gather communication. Experimental results demonstrate that the theoretical bounds are correct and tight.

IEEE Transactions on Very Large Scale Integration Systems | 2013

An Analytical Latency Model for Networks-on-Chip

Abbas Eslami Kiasari; Zhonghai Lu; Axel Jantsch

We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.

networks on chips | 2009

Scalability of network-on-chip communication architecture for 3-D meshes

Awet Yemane Weldezion; Matt Grange; Dinesh Pamunuwa; Zhonghai Lu; Axel Jantsch; Roshan Weerasekera; Hannu Tenhunen

Design Constraints imposed by global interconnect delays as well as limitations in integration of disparate technologies make 3-D chip stacks an enticing technology solution for massively integrated electronic systems. The scarcity of vertical interconnects however imposes special constraints on the design of the communication architecture. This article examines the performance and scalability of different communication topologies for 3-D Network-on-Chips (NoC) using Through-Silicon-Vias (TSV) for inter-die connectivity. Cycle accurate RTL-level simulations are conducted for two communication schemes based on a 7-port switch and a centrally arbitrated vertical bus using different traffic patterns. The scalability of the 3-D NoC is examined under both communication architectures and compared to 2-D NoC structures in terms of throughput and latency in order to quantify the variation of network performance with the number of nodes and derive key design guidelines.

Explore More