Zhen Ming Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhen Ming Jiang is active.

Explore More

Publication

Featured researches published by Zhen Ming Jiang.

international conference on software maintenance | 2009

Automated performance analysis of load tests

Zhen Ming Jiang; Ahmed E. Hassan; Gilbert Hamann; Parminder Flora

The goal of a load test is to uncover functional and performance problems of a system under load. Performance problems refer to the situations where a system suffers from unexpectedly high response time or low throughput. It is difficult to detect performance problems in a load test due to the absence of formally-defined performance objectives and the large amount of data that must be examined. In this paper, we present an approach which automatically analyzes the execution logs of a load test for performance problems. We first derive the systems performance baseline from previous runs. Then we perform an in-depth performance comparison against the derived performance baseline. Case studies show that our approach produces few false alarms (with a precision of 77%) and scales well to large industrial systems.

international conference on software engineering | 2013

Assisting developers of big data analytics applications when deploying on hadoop clouds

Weiyi Shang; Zhen Ming Jiang; Hadi Hemmati; Brain Adams; Ahmed E. Hassan; Patrick Martin

Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.

international conference on software maintenance | 2008

Automatic identification of load testing problems

Zhen Ming Jiang; Ahmed E. Hassan; Gilbert Hamann; Parminder Flora

Many software applications must provide services to hundreds or thousands of users concurrently. These applications must be load tested to ensure that they can function correctly under high load. Problems in load testing are due to problems in the load environment, the load generators, and the application under test. It is important to identify and address these problems to ensure that load testing results are correct and these problems are resolved. It is difficult to detect problems in a load test due to the large amount of data which must be examined. Current industrial practice mainly involves time-consuming manual checks which, for example, grep the logs of the application for error messages. In this paper, we present an approach which mines the execution logs of an application to uncover the dominant behavior (i.e., execution sequences) for the application and flags anomalies (i.e., deviations) from the dominant behavior. Using a case study of two open source and two large enterprise software applications, we show that our approach can automatically identify problems in a load test. Our approach flags < 0.01% of the log lines for closer analysis by domain experts. The flagged lines indicate load testing problems with a relatively small number of false alarms. Our approach scales well for large applications and is currently used daily in practice.

empirical software engineering and measurement | 2010

Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project

Emad Shihab; Zhen Ming Jiang; Walid M. Ibrahim; Bram Adams; Ahmed E. Hassan

Research studying the quality of software applications continues to grow rapidly with researchers building regression models that combine a large number of metrics. However, these models are hard to deploy in practice due to the cost associated with collecting all the needed metrics, the complexity of the models and the black box nature of the models. For example, techniques such as PCA merge a large number of metrics into composite metrics that are no longer easy to explain. In this paper, we use a statistical approach recently proposed by Cataldo et al. to create explainable regression models. A case study on the Eclipse open source project shows that only 4 out of the 34 code and process metrics impacts the likelihood of finding a post-release defect. In addition, our approach is able to quantify the impact of these metrics on the likelihood of finding post-release defects. Finally, we demonstrate that our simple models achieve comparable performance over more complex PCA-based models while providing practitioners with intuitive explanations for its predictions.

international conference on quality software | 2010

Mining Performance Regression Testing Repositories for Automated Performance Analysis

King Chun Foo; Zhen Ming Jiang; Bram Adams; Ahmed E. Hassan; Ying Zou; Parminder Flora

Performance regression testing detects performance regressions in a system under load. Such regressions refer to situations where software performance degrades compared to previous releases, although the new version behaves correctly. In current practice, performance analysts must manually analyze performance regression testing data to uncover performance regressions. This process is both time-consuming and error-prone due to the large volume of metrics collected, the absence of formal performance objectives and the subjectivity of individual performance analysts. In this paper, we present an automated approach to detect potential performance regressions in a performance regression test. Our approach compares new test results against correlations pre-computed performance metrics extracted from performance regression testing repositories. Case studies show that our approach scales well to large industrial systems, and detects performance problems that are often overlooked by performance analysts.

foundations of software engineering | 2012

An industrial study on the risk of software changes

Emad Shihab; Ahmed E. Hassan; Bram Adams; Zhen Ming Jiang

Modelling and understanding bugs has been the focus of much of the Software Engineering research today. However, organizations are interested in more than just bugs. In particular, they are more concerned about managing risk, i.e., the likelihood that a code or design change will cause a negative impact on their products and processes, regardless of whether or not it introduces a bug. In this paper, we conduct a year-long study involving more than 450 developers of a large enterprise, spanning more than 60 teams, to better understand risky changes, i.e., changes for which developers believe that additional attention is needed in the form of careful code or design reviewing and/or more testing. Our findings show that different developers and different teams have their own criteria for determining risky changes. Using factors extracted from the changes and the history of the files modified by the changes, we are able to accurately identify risky changes with a recall of more than 67%, and a precision improvement of 87% (using developer specific models) and 37% (using team specific models), over a random model. We find that the number of lines and chunks of code added by the change, the bugginess of the files being changed, the number of bug reports linked to a change and the developer experience are the best indicators of change risk. In addition, we find that when a change has many related changes, the reliability of developers in marking risky changes is negatively affected. Our findings and models are being used today in practice to manage the risk of software projects.

international conference on performance engineering | 2012

Automated detection of performance regressions using statistical process control techniques

Thanh H. D. Nguyen; Bram Adams; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

The goal of performance regression testing is to check for performance regressions in a new version of a software system. Performance regression testing is an important phase in the software development process. Performance regression testing is very time consuming yet there is usually little time assigned for it. A typical test run would output thousands of performance counters. Testers usually have to manually inspect these counters to identify performance regressions. In this paper, we propose an approach to analyze performance counters across test runs using a statistical process control technique called control charts. We evaluate our approach using historical data of a large software team as well as an open-source software project. The results show that our approach can accurately identify performance regressions in both software systems. Feedback from practitioners is very promising due to the simplicity and ease of explanation of the results.

mining software repositories | 2009

MapReduce as a general framework to support research in Mining Software Repositories (MSR)

Weiyi Shang; Zhen Ming Jiang; Bram Adams; Ahmed E. Hassan

Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REXs. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.

international conference on software engineering | 2014

Detecting performance anti-patterns for applications developed using object-relational mapping

Tse-Hsun Chen; Weiyi Shang; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Object-Relational Mapping (ORM) provides developers a conceptual abstraction for mapping the application code to the underlying databases. ORM is widely used in industry due to its convenience; permitting developers to focus on developing the business logic without worrying too much about the database access details. However, developers often write ORM code without considering the impact of such code on database performance, leading to cause transactions with timeouts or hangs in large-scale systems. Unfortunately, there is little support to help developers automatically detect suboptimal database accesses. In this paper, we propose an automated framework to detect ORM performance anti-patterns. Our framework automatically flags performance anti-patterns in the source code. Furthermore, as there could be hundreds or even thousands of instances of anti-patterns, our framework provides sup- port to prioritize performance bug fixes based on a statistically rigorous performance assessment. We have successfully evaluated our framework on two open source and one large-scale industrial systems. Our case studies show that our framework can detect new and known real-world performance bugs and that fixing the detected performance anti- patterns can improve the system response time by up to 98%.

mining software repositories | 2006

Examining the evolution of code comments in PostgreSQL

Zhen Ming Jiang; Ahmed E. Hassan

It is common, especially in large software systems, for developers to change code without updating its associated comments due to their unfamiliarity with the code or due to time constraints. This is a potential problem since outdated comments may confuse or mislead developers who perform future development. Using data recovered from CVS, we study the evolution of code comments in the PostgreSQL project. Our study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer.

Explore More