Is this you? Create Your Porfile

Chunyuan Zhang

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chunyuan Zhang is active.

Explore More

Publication

Featured researches published by Chunyuan Zhang.

high performance computing and communications | 2007

Quantification of cut sequence set for fault tree analysis

Dong Liu; Chunyuan Zhang; Weiyan Xing; Rui Li; Haiyan Li

A new evaluation method is presented that employs cut sequence set (CSS) to analyze fault trees. A cut sequence is a set of basic events that fail in a specific order that can induce top event. CSS is the aggregate of all the cut sequences in a fault tree. The paper continues its former researches on CSS and uses CSS, composed of sequential failure expressions (SFE), to represent the occurrence of top event. According to the time relationships among the events in each SFE, SFE can be evaluated by different multi-integration formulas, and then the occurrence probability of top event can be obtained by summing up all the evaluation results of SFE. Approximate approaches are also put forward to simplify computation. At last, an example is used to illustrate the applications of CSS quantification. CSS and its quantification provide a new and compact approach to evaluate fault trees.

international conference on embedded software and systems | 2007

Cut Sequence Set Generation for Fault Tree Analysis

Dong Liu; Weiyan Xing; Chunyuan Zhang; Rui Li; Haiyan Li

For a fault tree, especially for a dynamic fault tree, the occurrence of top event depends on not only the combination of basic events, but also on the occurrence order of basic events. Cut sequence is a set of basic events that fail in a particular order that can induce top event. Cut sequence set (CSS) is the aggregate of all cut sequences in a fault tree. The paper puts forward an algorithm, named CSSA (CSS Algorithm), to generate the CSS of a fault tree. CSSA uses sequential failure symbol (SFS) to describe the sequential failure relationship between two events. And then, cut sequence can be expressed by sequential failure expression (SFE), which is a chain of basic events connected by SFSs. In order to get the CSS, SFS transformation is applied to static gates and dynamic gates, the result of which is reconstructed to the standard form of CSS according to the inference rules of SFE. At length, an example is used to illustrate the detailed processes of the approach. CSSA provides a new qualitative method that extends the existing fault tree analysis models.

british machine vision conference | 2015

Enable Scale and Aspect Ratio Adaptability in Visual Tracking with Detection Proposals.

Dafei Huang; Lei Luo; Mei Wen; Zhaoyun Chen; Chunyuan Zhang

Among increasingly complicated trackers in visual tracking area, recently proposed correlation filter based trackers have achieved appealing performance despite their great simplicity and superior speed. However, the filter input is a bounding box of fixed size, so they are not born with the adaptability to target’s scale and aspect ratio changes. Although scaleadaptive variants have been proposed, they are not flexible enough due to pre-defined scale sampling manners. Moreover, to the best of our knowledge, no correlation filter variant has been proposed to handle aspect ratio variation. To tackle this problem, this paper integrates the class-agnostic detection proposal method, which is widely adopted in object detection area, into a correlation filter tracker, and presents KCFDP tracker. The correlation filter part of KCFDP is based on KCF[2] with some modifications. We extend the HOG feature in KCF to a combination of HOG, intensity, and color naming by simply concatenating the three features, resulting in 42 feature channels. The model updating scheme in KCF, which is simple linear interpolation, is substituted with a more robust scheme presented in [1]. EdgeBoxes[4] is adopted to generate flexible detection proposals and enable the scale and aspect ratio adaptability of our tracker. It traverses the whole image in a sliding window manner, and scores every sampled bounding box according to the number of contours that are wholly enclosed. To accelerate EdgeBoxes and produce less unnecessary proposals, we set the minimum proposal area and aspect ratio range dynamically in sliding window sampling according to the current target size. In the tracking pipeline, KCF is firstly performed to estimate the preliminary target location ld . Within a patch zd extracted from current frame, KCF locates the target center according to the location of the maximum element in f : f(zd) = kxz d · α, (1)

acm multimedia | 2009

Streaming HD H.264 encoder on programmable processors

Nan Wu; Mei Wen; Wei Wu; Ju Ren; Huayou Su; Changqing Xun; Chunyuan Zhang

Programmable processors have great advantage over dedicated ASIC design under intense time-to-market pressure. However, real-time encoding of high-definition (HD) H.264 video (up to 1080p) is a challenge to most existing programmable processors. On the other hand, model-based design is widely accepted in developing complex media program. Stream model, an emerging model-based programming method, shows surprising efficiency on many compute-intensive domains especially for media processing. On the basis, this paper proposes a set of streaming techniques for H.264 encoding, and then develops all of the code based on the X264 reference code. Our streaming H.264 encoder is a pure software implementation completely written in high-level language without special hardware/algorithm support. Real execution results show that our encoder achieves significant speedup over the original X264 encoder on various programmable architectures: on X86 CoreTM2 E8200 the speedup is 1.8x, on MIPS 4KEc the speedup is 3.7x, on TMS320 C6416 DSP the speedup is 5.5x, on stream processor STORM-SP16 G220 the speedup is 6.1x. Especially, on STORM processor, the streaming encoder achieves the performance of 30.6 frames per second for a 1080P HD sequence, satisfying the real-time requirement. These indicate that streaming is extremely efficient for this kind of media workload. Our work is also applicable for other media processing applications, and provides architecture insights into dedicated ASIC or FPGA HD H.264 encoders.

international conference on parallel and distributed systems | 2012

A Parallel H.264 Encoder with CUDA: Mapping and Evaluation

Nan Wu; Mei Wen; Huayou Su; Ju Ren; Chunyuan Zhang

Efficient mapping of a real-time HD video application to graphics hardware is challenging. Developers face the challenges of choosing the right parallelism model, balancing threads process granularity between massive computing resources on the GPU, and partitioning tasks between the CPU and GPU. The paper illustrated the mapping approaches by a case of HD H.264 encoder based on X264 reference code and then evaluating it on state-of-the-art CPU and GPUs in depth. In the paper, we first split most of the computing task into Single-Instruction Multiple-Thread (SIMT) kernels, which are then chained intocertaininput/output data stream. Then we implementeda completed H.264 encoding on the computer unified device architecture (CUDA) platform. Finally, we present methods for exploiting multi-level parallelism and memory efficiency when mapping H.264 code, which we use to increase the efficiency of the execution on GPUs. Our experimental results show that computation efficiency of GPU and then real-time encoding performance are achieved with CUDA.

international symposium on microarchitecture | 2008

On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator

Mei Wen; Nan Wu; Chunyuan Zhang; Qianming Yang; Jun Ren; Yi He; Wei Wu; Jun Chai; Maolin Guan; Changqing Xun

In this paper shows the extension of application domains, hardware-managed memory structures such as caches are drawing attention for dealing with irregular stream applications. However, since a real application usually has both regular and irregular stream characteristics, conventional stream register files, caches, or combinations thereof have shortcomings. This article focuses on combining software- and hardware-managed memory structures and presents a new syncretic memory system based on the ft64 stream accelerator.

high performance computing and communications | 2007

Efficient broadcasting in multi-radio multi-channel and multi-hop wireless networks based on self-pruning

Li Li; Bin Qin; Chunyuan Zhang; Haiyan Li

An important question in multi-radio multi-channel and multi-hop networks is how to perform efficient network-wide broadcast. Currently almost all broadcasting protocols assume a single-radio single-channel network model. Simply using them in multi-channel environment without careful enhancement will result in unnecessary redundancy. In this paper, we focus on reducing the amount of redundant traffic of broadcasting under multi-channel environment. We propose a general model for broadcasting and reduce the efficient broadcast problem into the minimal strong connected dominating set problem of the interface-extend graph which extends the original network topology across interfaces. Using interface-extend graph, we describe our Multi-Channel Self-Pruning broadcast protocol and simulation shows that our protocol can significantly reduce the transmission cost. To the best of our knowledge, our work is the first self-pruning broadcast scheme in this area.

annual computer security applications conference | 2004

Multiple-dimension scalable Adaptive Stream Architecture

Mei Wen; Nan Wu; Haiyan Li; Chunyuan Zhang

Intensive processing applications, such as scientific computation, signal processing, and graphics rendering, motivate new processor architectures that place new burdens on the designer. These applications named Stream Applications demand very high arithmetic rates and data bandwidth, but lack data reuse. At present modern VLSI technology makes arithmetic units relatively cheaper. MASA(Multiple-dimension scalable Adaptive Stream Architecture) presented in this paper is a prototype that operate on streams directly. It is different from DSP and special high performance single-chip architecture because it combines flexibility and high performance. It has basic features of all stream processing, provides bandwidth hierarchy, makes ALU array execute with full loads and decomposes application into a set of computation modules to execute space-multiplexing or time-multiplexing. The multiple dimensions scalability of MASA, includes task-level, loop-level, instruction-level and data-level, and enables it to meet the demand of stream applications. This paper describes MASA architecture and stream model in the first half, and explores the features and advantages of MASA through mapping stream applications to hardware in the second half.

The Journal of Supercomputing | 2013

Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference

Jun Chai; Huayou Su; Mei Wen; Xing Cai; Nan Wu; Chunyuan Zhang

Bayesian inference is one of the most important methods for estimating phylogenetic trees in bioinformatics. Due to the potentially huge computational requirements, several parallel algorithms of Bayesian inference have been implemented to run on CPU-based clusters, multicore CPUs, or small clusters of CPUs and GPUs. To the best of our knowledge, however, none of the existing methods is able to simultaneously and fully utilize both CPUs and GPUs for the computations, leaving idle either the CPU part or the GPU part of modern heterogeneous supercomputers. Aiming at an optimized utilization of heterogeneous computing resources, which is a promising hardware architecture for future bioinformatics applications, we present a new hybrid parallel algorithm and implementation of Bayesian phylogenetic inference, which combines MPI, OpenMP, and CUDA programming. The novelty of our algorithm, denoted as oMC3, is its ability of using CPU cores simultaneously with GPUs for the computations, while ensuring a fair work division between the two types of hardware components. We have implemented oMC3 based on MrBayes, which is one of the most popular software packages for Bayesian phylogenetic inference. Numerical experiments show that oMC3 obtains 2.5× speedup over nMC3, which is a cutting-edge GPU implementation of MrBayes, on a single server consisting of two GPUs and sixteen CPU cores. Moreover, oMC3 scales nicely when 128 GPUs and 1536 CPU cores are in use.

embedded systems for real-time multimedia | 2009

Software parallel CAVLC encoder based on stream processing

Ju Ren; Yi He; Wei Wu; Mei Wen; Nan Wu; Chunyuan Zhang

Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and precedence constraints, which becomes a bottleneck to implement H.264 encoder efficiently. This paper presents a software parallel CAVLC encoder based on stream processing. Many approaches are explored to solve the restrictions of parallelizing CAVLC caused by data dependency and branch/loop instructions. Experiment results show that our parallel CAVLC encoder on two stream processing platforms of STORM and GPU achieves 3.03x and 2.08x speedup over the original serial CAVLC respectively. Finally, the proposed parallel CAVLC encoder coupled with stream processor enables a real-time encoding of 1080p H.264 video.

Explore More