Is this you? Create Your Porfile

Zhiying Wang

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhiying Wang is active.

Explore More

Publication

Featured researches published by Zhiying Wang.

international symposium on computer architecture | 2011

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip

Sheng Ma; Natalie D. Enright Jerger; Zhiying Wang

With the emergence of many-core architectures, it is quite likely that multiple applications will run concurrently on a system. Existing locally and globally adaptive routing algorithms largely overlook issues associated with workload consolidation. The shortsightedness of locally adaptive routing algorithms limits performance due to poor network congestion avoidance. Globally adaptive routing algorithms attack this issue by introducing a congestion propagation network to obtain network status information beyond neighboring nodes. However, they may suffer from intra- and inter-application interference during output port selection for consolidated workloads, coupling the behavior of otherwise independent applications and negatively affecting performance. To address these two issues, we propose Destination-Based Adaptive Routing (DBAR). We design a novel low-cost congestion propagation network that leverages both local and non-local network information for more accurate congestion estimates. Thus, DBAR offers effective adaptivity for congestion beyond neighboring nodes. More importantly, by integrating the destination into the selection function, DBAR mitigates intra- and inter-application interference and offers dynamic isolation among regions. Experimental results show that DBAR can offer better performance than the best baseline algorithm for all measured configurations; it is well suited for workload consolidation. The wiring overhead of DBAR is low and DBAR provides improvement in the energy-delay product for medium and high injection rates.

symposium on computer arithmetic | 2007

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

Libo Huang; Li Shen; Kui Dai; Zhiying Wang

The floating-point multiply-add fused (MAF) unit sets a new trend in the processor design to speed up floatingpoint performance in scientific and multimedia applications. This paper proposes a new architecture for the MAF unit that supports multiple IEEE precisions multiply-add operation (AtimesB+C) with Single Instruction Multiple Data (SIMD) feature. The proposed MAF unit can perform either one double-precision or two parallel single-precision operations using about 18% more hardware than a conventional double-precision MAF unit and with 9% increase in delay. To accommodate the simultaneous computation of two single-precision MAF operations, several basic modules of double-precision MAF unit are redesigned. They are either segmented by precision mode dependent multiplexers or attached by the duplicated hardware. The proposed MAF unit can be fully pipelined and the experimental results show that it is suitable for processors with floatingpoint unit (FPU).

high performance computer architecture | 2012

Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip

Sheng Ma; Natalie D. Enright Jerger; Zhiying Wang

Routing algorithms for networks-on-chip (NoCs) typically only have a small number of virtual channels (VCs) at their disposal. Limited VCs pose several challenges to the design of fully adaptive routing algorithms. First, fully adaptive routing algorithms based on previous deadlock-avoidance theories require a conservative VC re-allocation scheme: a VC can only be re-allocated when it is empty, which limits performance. We propose a novel VC re-allocation scheme, whole packet forwarding (WPF), which allows a non-empty VC to be re-allocated. WPF leverages the observation that the majority of packets in NoCs are short. We prove that WPF does not induce deadlock if the routing algorithm is deadlock-free using conservative VC re-allocation. WPF is an important extension of previous deadlock-avoidance theories. Second, to efficiently utilize WPF in VC-limited networks, we design a novel fully adaptive routing algorithm which maintains packet adaptivity without significant hardware cost. Compared with conservative VC re-allocation, WPF achieves an average 88.9% saturation throughput improvement in synthetic traffic patterns and an average 21.3% and maximal 37.8% speedup for PARSEC applications with heavy network loads. Our design also offers higher performance than several partially adaptive and deterministic routing algorithms.

high performance computer architecture | 2012

Supporting efficient collective communication in NoCs

Sheng Ma; Natalie D. Enright Jerger; Zhiying Wang

Across many architectures and parallel programming paradigms, collective communication plays a key role in performance and correctness. Hardware support is necessary to prevent important collective communication from becoming a system bottleneck. Support for multicast communication in Networks-on-Chip (NoCs) has achieved substantial throughput improvements and power savings. In this paper, we explore support for reduction or many-to-one communication operations. As a case study, we focus on acknowledgement messages (ACK) that must be collected in a directory protocol before a cache line may be upgraded to or installed in the modified state. This paper makes two primary contributions: an efficient framework to support the reduction of ACK packets and a novel Balanced, Adaptive Multicast (BAM) routing algorithm. The proposed message combination framework complements several multicast algorithms. By combining ACK packets during transmission, this framework not only reduces packet latency by 14.1% for low-to-medium network loads, but also improves the network saturation throughput by 9.6% with little overhead. The balanced buffer resource configuration of BAM improves the saturation throughput by an additional 13.8%. For the PARSEC benchmarks, our design offers an average speedup of 12.7% and a maximal speedup of 16.8%.

IEEE Transactions on Computers | 2015

Leaving One Slot Empty: Flit Bubble Flow Control for Torus Cache-Coherent NoCs

Sheng Ma; Zhiying Wang; Zonglin Liu Liu; Natalie D. Enright Jerger

Short and long packets co-exist in cache-coherent NoCs. Existing designs for torus networks do not efficiently handle variable-size packets. For deadlock free operations, a design uses two VCs, which negatively affects the router frequency. Some optimizations use one VC. Yet, they regard all packets as maximum-length packets, inefficiently utilizing the precious buffers. We propose flit bubble flow control (FBFC), which maintains one free flit-size buffer slot to avoid deadlock. FBFC uses one VC, and does not treat short packets as long ones. It achieves both high frequency and efficient buffer utilization. FBFC performs 92.8 and 34.2 percent better than LBS and CBS for synthetic traffic in a 4 × 4 torus. The gains increase in larger networks; they are 107.2 and 40.1 percent in an 8 × 8 torus. FBFC achieves an average 13.0 percent speedup over LBS for PARSEC workloads. Our results also show that FBFC is more power efficient than LBS and CBS, and a torus with FBFC is more power efficient than a mesh.

IEEE Transactions on Parallel and Distributed Systems | 2014

Novel Flow Control for Fully Adaptive Routing in Cache-coherent NoCs

Sheng Ma; Zhiying Wang; Natalie D. Enright Jerger; Li Shen; Nong Xiao

Routing algorithms for cache-coherent NoCs only have limited VCs at their disposal, which poses challenges to the design of routing algorithms. Existing fully adaptive routing algorithms apply conservative VC re-allocation: only empty VCs can be re-allocated, which limits performance. We propose two novel flow control designs. First, whole packet forwarding (WPF) re-allocates a nonempty VC if the VC has enough free buffers for an entire packet. WPF does not induce deadlock if the routing algorithm is deadlock-free using conservative VC re-allocation. It is an important extension to several deadlock avoidance theories. Second, we extend Duatos theory to apply aggressive VC re-allocation on escape VCs without deadlock. Finally, we propose a design which maintains maximal routing flexibility with low hardware cost. For synthetic traffic, our design performs averagely 88.9 percent better than existing fully adaptive routing. Our design is superior to partially adaptive and deterministic routing.

autonomic and trusted computing | 2006

A dynamic trust model based on feedback control mechanism for p2p applications

Chenlin Huang; Huaping Hu; Zhiying Wang

Trust is critical in P2P online communities. The traditional trust and reputation mechanisms lack flexibility in modeling the diversity and dynamicity of trust in such environment. In this paper, we try to evaluate trust with the introduction of servomechanism and propose DWTrust: a trust model based on dynamic weights for P2P applications. DWTrust adopts a novel feedback control mechanism to realize the assessment of trust in which a set of subjective weights are set and adjusted in time to reflect the dynamicity of trust environment. Trust assessment is simplified by mapping the factors influencing trust to feedbacks on dynamic weights. A series of experiments are designed to demonstrate the effectiveness, benefit and adaptability of DWTrust.

international conference on parallel and distributed systems | 2005

Design of a Configurable Embedded Processor Architecture for DSP Functions

Hong Yue; Ming-che Lai; Kui Dai; Zhiying Wang

Most of the embedded applications are served today by general-purpose processors or special-purpose ASIC processors containing hundreds to thousands of ALUs. While such solutions are efficient, they lack flexibility and are not feasible for certain embedded applications. ASIP(application specific instruction processor) design methodology can not only satisfy the functionality and performance requirements of the embedded systems but also flexible. So it is widely adopted in embedded processor design domain. For the widely adopting of digital signal processing in the embedded applications, this paper studies a configurable VLIW processor architecture based on TTA(transport triggered architecture) for high performance digital signal processing in embedded systems. The methodology of ASIP design is applied and some handle optimizations are taken. It is shown that it has high performance to run the digital signal processing kernel applications, and its simplicity and flexibility encourages for further development with tuned functionality

international symposium on computers and communications | 2004

Protecting integrity and confidentiality for data communication

Fangyong Hou; Zhiying Wang; Yuhua Tang; Zhen Liu

This work presents a scheme to build data communication system that can effectively protect data integrity and confidentiality. Firstly, This work briefly introduces the situation of integrity and confidentiality protection. Then, This work brings forward a new cipher, which uses a keystream generator to produce infinite number of frame secret keys basing on an infinite root key space, and use a unique one-off frame secret key for each data encryption/decryption. Basing on this cipher, we construct a data communication system. This work illustrates how to build such a system and analyze its protections of data integrity and confidentiality. With the character of cipher, it offers high resistance against cryptanalysis to prevent data disclosure, and it gives little opportunities to those intractable attacks that can compromise data integrity.

grid and cooperative computing | 2004

Modeling Time-Related Trust

Chenlin Huang; Huaping Hu; Zhiying Wang

Most of trust models today treat trust as a quantitative constant, focusing on the representation and the operations of uncertain believes, but ignore a fact that some believes are changing with time. To describe the change of trust relationship, the notion of Time-related Trust is introduced in this paper and the relationship between time and trust are discussed. Then time-related opinion is defined to represent time-related trust relationship based on Josang’s work. A trust model based on it is also presented for modeling and reasoning about the time-related trust. With the model we further our analysis and discussion about the effects of time in trust and some properties of time-related opinion are concluded. Our work is helpful for understanding the dynamic property of trust and making the management of trust relationships more rational.

Explore More