Zhenbang Chen
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhenbang Chen.
international conference on cloud computing | 2012
Pei Fan; Zhenbang Chen; Ji Wang; Zibin Zheng; Michael R. Lyu
Nowadays, more and more scientific applications are moving to cloud computing. The optimal deployment of scientific applications is critical for providing good services to users. Scientific applications are usually topology-aware applications. Therefore, considering the topology of a scientific application during the development will benefit the performance of the application. However, it is challenging to automatically discover and make use of the communication pattern of a scientific application while deploying the application on cloud. To attack this challenge, in this paper, we propose a framework to discover the communication topology of a scientific application by pre-execution and multi-scale graph clustering, based on which the deployment can be optimized. Comprehensive experiments are conducted by employing a well-known MPI benchmark and comparing the performance of our method with those of other methods. The experimental results show the effectiveness of our topology-aware deployment method.
international conference on software engineering | 2015
Yufeng Zhang; Zhenbang Chen; Ji Wang; Wei Dong; Zhiming Liu
A challenging problem in software engineering is to check if a program has an execution path satisfying a regular property. We propose a novel method of dynamic symbolic execution (DSE) to automatically find a path of a program satisfying a regular property. What makes our method distinct is when exploring the path space, DSE is guided by the synergy of static analysis and dynamic analysis to find a target path as soon as possible. We have implemented our guided DSE method for Java programs based on JPF and WALA, and applied it to 13 real-world open source Java programs, a total of 225K lines of code, for extensive experiments. The results show the effectiveness, efficiency, feasibility and scalability of the method. Compared with the pure DSE on the time to find the first target path, the average speedup of the guided DSE is more than 258X when analyzing the programs that have more than 100 paths.
International Journal of Web and Grid Services | 2012
Pei Fan; Ji Wang; Zhenbang Chen; Zibin Zheng; Michael R. Lyu
Similar to Grid computing systems, scientific applications in cloud are large scale distributed systems that are deployed on distributed cloud nodes. Scientific applications usually have a lot of communications between the nodes for deployment. Therefore conventional ranking methods are not appropriate for deploying scientific applications. The reason is ranking methods do not consider the relations between nodes. We propose a novel spectral clustering based deployment method that takes not only the computing qualities of cloud nodes into account, but also the communication performance between different nodes. Experimental results show the effectiveness of our method for improving the performance of scientific applications.
computer software and applications conference | 2012
Haibo Mi; Huaimin Wang; Hua Cai; Yangfan Zhou; Michael R. Lyu; Zhenbang Chen
In large-scale cloud computing systems, the growing scale and complexity of component interactions pose great challenges for operators to understand the characteristics of system performance. Performance profiling has long been proved to be an effective approach to performance analysis; however, existing approaches do not consider two new requirements that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, visual analytics should be utilized to make profiling results more readable. To address the above two issues, in this paper, we present P-Tracer, an online performance profiling approach specifically tailored for large-scale cloud computing systems. P-Tracer constructs a specific search engine that adopts a proactive way to process performance logs and generates particular indices for fast queries; furthermore, PTracer provides users with a suite of web-based interfaces to query statistical information of all kinds of services, which helps them quickly and intuitively understand system behavior. The approach has been successfully applied in Alibaba Cloud Computing Inc. to conduct online performance profiling both in production clusters and test clusters. Experience with one real-world case demonstrates that P-Tracer can effectively and efficiently help users conduct performance profiling and localize the primary causes of performance anomalies.
high assurance systems engineering | 2015
Xianjin Fu; Zhenbang Chen; Yufeng Zhang; Chun Huang; Wei Dong; Ji Wang
Message Passing Interfaces (MPI) plays an important role in parallel computing. Many parallel applications are implemented as MPI programs. The existing methods of bug detection for MPI programs have the shortage of providing both input and non-determinism coverage, leading to missed bugs. In this paper, we employ symbolic execution to ensure the input coverage, and propose an on-the-fly schedule algorithm to reduce the interleaving explorations for non-determinism coverage, while ensuring the soundness and completeness. We have implemented our approach as a tool, called MPISE, which can automatically detect the deadlock and runtime bugs in MPI programs. The results of the experiments on benchmark programs and real world MPI programs indicate that MPISE finds bugs effectively and efficiently. In addition, our tool also provides diagnostic information and replay mechanism to help understand bugs.
service oriented software engineering | 2014
Jingwen Zhou; Zhenbang Chen; Haibo Mi; Ji Wang
Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either not powerful or efficient enough, or too complex and expensive to deploy and maintain. In this paper, we present MTracer, which is a lightweight trace-oriented monitoring system for medium-scale distributed systems. We have proposed and implemented several optimizations to improve the efficiency of the monitor server in MTracer. A web-based frontend is also provided to visualize a monitored system from different perspectives. We have validated MTracer in a real medium-scale environment. The results indicate that MTracer has a very lower overhead, and can handle more than 4000 events per second.
international conference on parallel and distributed systems | 2012
Pei Fan; Zhenbang Chen; Ji Wang; Zibin Zheng
Infrastructure-as-a-Service (IaaS) clouds provide on-demand virtual machines (VMs) to users. How to improve the quality of IaaS cloud services is important for service providers. Currently, the VMs in an IaaS cloud are usually deployed with respect to the maximum utilization of resources. In this paper, we propose an online VM optimization method for IaaS clouds. Our method mainly optimizes the VM deployment in IaaS clouds according to the traffics among VMs. VMs are allocated with respect to cabinet capacities at the beginning. At runtime, we monitor the traffics among VMs to get the traffic topology, based on which related VMs are migrated to neighbors to improve performance and reduce the traffics across cabinets. Preliminary simulation experiments are conducted on a well-know simulator, and the experimental results indicate that our method is effective and promising.
dependable systems and networks | 2011
Xiang Rao; Huaimin Wang; Dianxi Shi; Zhenbang Chen; Hua Cai; Qi Zhou; Tingtao Sun
Extracting fault features with the error logs of fault injection tests has been widely studied in the area of large scale distributed systems for decades. However, the process of extracting features is severely affected by a large amount of noisy logs. While the existing work tries to solve the problem by compressing logs in temporal and spatial views or removing the semantic redundancy between logs, they fail to consider the co-existence of other noisy faults that generate error logs instead of injected faults, for example, random hardware faults, unexpected bugs of softwares, system configuration faults or the error rank of a log severity. During a fault feature extraction process, those noisy faults generate error logs that are not related to a target fault, and will strongly mislead the resulted fault features. We call an error log that is not related to a target fault a noisy error log. To filter out noisy error logs, we present a similarity-based error log filtering method SBF, which consists of three integrated steps: (1) model error logs into time series and use haar wavelet transform to get the approximate time series; (2) divide the approximate time series into sub time series by valleys; (3) identify noisy error logs by comparing the similarity between the sub time series of target error logs and the template of noisy error logs. We apply our log filtering method in an enterprise cloud system and show its effectiveness. Compared with the existing work, we successfully filter out noisy error logs and increase the precision and the recall rate of fault feature extraction.1
international symposium on software reliability engineering | 2014
Jingwen Zhou; Zhenbang Chen; Ji Wang; Zibin Zheng; Wei Dong
Cloud computing provides a new paradigm for resource utilization and sharing. However, the reliability problems, like system failures, often happen in cloud systems and bring enormous loss. Trace-oriented monitoring is an important runtime method to improve the reliability of cloud systems. In this paper, we propose to bring runtime verification into trace-oriented monitoring, to facilitate the specification of monitoring requirements and to improve the efficiency of monitoring cloud systems. Based on a data set collected from a cloud storage system in a real environment, we validate our approach by monitoring the critical properties of the storage system. The preliminary experimental results indicate the promise of our approach.
ieee international conference on cloud computing technology and science | 2014
Jingwen Zhou; Zhenbang Chen; Ji Wang; Zibin Zheng; Michael R. Lyu
User request trace-oriented monitoring is an effective method to improve the reliability of cloud systems. However, there are some difficulties in getting traces in practice, which hinder the development of trace-oriented monitoring research. In this paper, we release a fine-grained user request-centric open trace data set, called Trace Bench, collected on a real world cloud storage system deployed in a real environment. During collecting, many aspects are considered to simulate different scenarios, including cluster size, request type, workload speed, etc. Besides recording the traces when the monitored system is running normally, we also collect the traces under the situation with faults injected. With a mature injection tool, 14 faults are introduced, including function faults and performance faults. The traces in Trace Bench are clustered in different files, where each file corresponds to a certain scenario. The whole collection work lasted for more than half a year, resulting in more than 360, 000 traces in 361 files. In addition, we also employ several applications based on Trace Bench, which validate the helpfulness of Trace Bench for the field of trace-oriented monitoring.