Haipeng Cai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haipeng Cai is active.

Explore More

Publication

Featured researches published by Haipeng Cai.

state of the art in java program analysis | 2013

DUA-forensics: a fine-grained dependence analysis and instrumentation framework based on Soot

Raúl A. Santelices; Yiji Zhang; Haipeng Cai; Siyuan Jiang

We describe DUA-Forensics, our open-source Java-bytecode program analysis and instrumentation system built on top of Soot. DUA-Forensics has been in development for more than six years and has supported multiple research projects on efficient monitoring, test-suite augmentation, fault localization, symbolic execution, and change-impact analysis. Three core features of Soot have proven essential: the Java bytecode processor, the Jimple intermediate representation, and the API to access and manipulate Jimple programs. On top of these foundations, DUA-Forensics offers a number of features of potential interest to the Java-analysis community, including (1) a layer that facilitates the instrumentation of Jimple code, (2) a library modeling system for efficient points-to, data-flow, and symbolic analysis, and (3) a fine-grained dependence analysis component. These features have made our own research more productive, reliable, and effective.

international conference on software engineering | 2013

Quantitative program slicing: separating statements by relevance

Raúl A. Santelices; Yiji Zhang; Siyuan Jiang; Haipeng Cai; Ying-Jie Zhang

Program slicing is a popular but imprecise technique for identifying which parts of a program affect or are affected by a particular value. A major reason for this imprecision is that slicing reports all program statements possibly affected by a value, regardless of how relevant to that value they really are. In this paper, we introduce quantitative slicing (q-slicing), a novel approach that quantifies the relevance of each statement in a slice. Q-slicing helps users and tools focus their attention first on the parts of slices that matter the most. We present two methods for quantifying slices and we show the promise of q-slicing for a particular application: predicting the impacts of changes.

automated software engineering | 2014

Diver: precise dynamic impact analysis using dependence-based trace pruning

Haipeng Cai; Raúl A. Santelices

Impact analysis determines the effects that the behavior of program entities, or changes to them, can have on the rest of the system. Dynamic impact analysis is one practical form that computes smaller impact sets than static alternatives for concrete sets of executions. However, existing dynamic approaches can still produce impact sets that are too large to be useful. To address this problem, we present a novel dynamic impact analysis called DIVER that exploits static dependencies to identify runtime impacts much more precisely without reducing safety and at acceptable costs. Our preliminary empirical evaluation shows that DIVER can significantly increase the precision of dynamic impact analysis.

Journal of Systems and Software | 2015

A comprehensive study of the predictive accuracy of dynamic change-impact analysis

Haipeng Cai; Raúl A. Santelices

We comprehensively study the predictive accuracy of dynamic change-impact analysis.We assess this accuracy with large numbers of both artificial and repository changes.We found that dynamic impact analyses can suffer from both low precision and low recall.The accuracy for typical repository changes can be even lower than for random changes.Short executions of programs can be associated with an even lower recall. The correctness of software is affected by its constant changes. For that reason, developers use change-impact analysis to identify early the potential consequences of changing their software. Dynamic impact analysis is a practical technique that identifies potential impacts of changes for representative executions. However, it is unknown how reliable its results are because their accuracy has not been studied. This paper presents the first comprehensive study of the predictive accuracy of dynamic impact analysis in two complementary ways. First, we use massive numbers of random changes across numerous Java applications to cover all possible change locations. Then, we study more than 100 changes from software repositories, which are representative of developer practices. Our experimental approach uses sensitivity analysis and execution differencing to systematically measure the precision and recall of dynamic impact analysis with respect to the actual impacts observed for these changes. Our results for both types of changes show that the most cost-effective dynamic impact analysis known is surprisingly inaccurate with an average precision of 38-50% and average recall of 50-56% in most cases. This comprehensive study offers insights on the effectiveness of existing dynamic impact analyses and motivates the future development of more accurate impact analyses.

source code analysis and manipulation | 2014

SENSA: Sensitivity Analysis for Quantitative Change-Impact Prediction

Haipeng Cai; Siyuan Jiang; Raúl A. Santelices; Ying-Jie Zhang; Yiji Zhang

Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in practice. In addition to identifying dependencies, SENSA quantifies them to estimate how much or how likely a statement depends on another. Quantifying dependencies helps developers prioritize and focus their inspection of code relationships. To assess the benefits of quantifying dependencies with SENSA, we applied it to various statements across Java subjects to find and prioritize the potential impacts of changing those statements. We found that SENSA predicts the actual impacts of changes to those statements more accurately than static and dynamic forward slicing. Our SENSA prototype tool is freely available for download.

SERE '14 Proceedings of the 2014 Eighth International Conference on Software Security and Reliability | 2014

Estimating the Accuracy of Dynamic Change-Impact Analysis Using Sensitivity Analysis

Haipeng Cai; Raúl A. Santelices; Tianyu Xu

The reliability and security of software are affected by its constant changes. For that reason, developers use change-impact analysis early to identify the potential consequences of changing a program location. Dynamic impact analysis, in particular, identifies potential impacts on concrete, typical executions. However, the accuracy (precision and recall) of dynamic impact analyses for predicting the actual impacts of changes has not been studied. In this paper, we present a novel approach based on sensitivity analysis and execution differencing to estimate, for the first time, the accuracy of dynamic impact analyses. Unlike approaches that only use software repositories, which might not be available or might contain insufficient changes, our approach makes changes to every part of the software to identify actually impacted code and compare it with the predictions of dynamic impact analysis. Using this approach in addition to changes made by other researchers on multiple Java subjects, we estimated the accuracy of the best method-level dynamic impact analysis in the literature. Our results suggest that dynamic impact analysis can be surprisingly inaccurate with an average precision of 47-52% and recall of 56-87%. This study offers insights to developers into the effectiveness of existing dynamic impact analyses and motivates the future development of more accurate analyses.

ieee international conference on software analysis evolution and reengineering | 2015

A framework for cost-effective dependence-based dynamic impact analysis

Haipeng Cai; Raúl A. Santelices

Dynamic impact analysis can greatly assist developers with managing software changes by focusing their attention on the effects of potential changes relative to concrete program executions. While dependence-based dynamic impact analysis (DDIA) provides finer-grained results than traceability-based approaches, traditional DDIA techniques often produce imprecise results, incurring excessive costs thus hindering their adoption in many practical situations. In this paper, we present the design and evaluation of a DDIA framework and its three new instances that offer not only much more precise impact sets but also flexible cost-effectiveness options to meet diverse application needs such as different budgets and levels of detail of results. By exploiting both static dependencies and various dynamic information including method-execution traces, statement coverage, and dynamic points-to data, our techniques achieve that goal at reasonable costs according to our experiment results. Our study also suggests that statement coverage has generally stronger effects on the precision and cost-effectiveness of DDIA than dynamic points-to data.

ieee international conference on software analysis evolution and reengineering | 2015

TRACERJD: Generic trace-based dynamic dependence analysis with fine-grained logging

Haipeng Cai; Raúl A. Santelices

We present the design and implementation of TRACERJD, a toolkit devoted to dynamic dependence analysis via fine-grained whole-program dependence tracing. TRACERJD features a generic framework for efficient offline analysis of dynamic dependencies, including those due to exception-driven control flows. Underlying the framework is a hierarchical trace indexing scheme by which TRACERJD maintains the relationships among execution events at multiple levels of granularity while capturing those events at runtime. Built on this framework, several application tools are provided as well, including a dynamic slicer and a performance profiler. These example applications also demonstrate the flexibility and ease with which a variety of client analyses can be built based on the framework. We tested our toolkit on four Java subjects, for which the results suggest promising efficiency of TRACERJD for its practical use in various dependence-based tasks.

source code analysis and manipulation | 2014

On the Accuracy of Forward Dynamic Slicing and Its Effects on Software Maintenance

Siyuan Jiang; Raúl A. Santelices; Mark Grechanik; Haipeng Cai

Dynamic slicing is a practical and popular analysis technique used in various software-engineering tasks. Dynamic slicing is known to be incomplete because it analyzes only a subset of all possible executions of a program. However, it is less known that its results may inaccurately represent the dependencies that occur in those executions. Some researchers have identified this problem and developed extensions such as relevant slicing, which incorporates static information. Yet, dynamic slicing continues to be widely used, even though the extent of its inaccuracy is not well understood, which can affect the benefits of this analysis. In this paper, we present an approach to assess the accuracy of forward dynamic slices, which are used in software maintenance and evolution tasks. Because finding all actual dependencies is an undecidable problem, our approach instead computes bounds of the precision and recall of forward dynamic slices. Our approach uses sensitivity analysis and execution differencing to find a subset of all program statements that truly depend at runtime on another statement. Using this approach, we studied the accuracy of many forward dynamic slices from a variety of Java applications. Our results show that forward dynamic slicing can have low recall -- for dependencies in the analyzed executions -- and some potential imprecision. We also conducted a case study that shows how this inaccuracy affects a software maintenance task. To the best of our knowledge, ours is the first work that quantifies the intrinsic limitations of dynamic slicing.

ACM Transactions on Software Engineering and Methodology | 2016

D ia P ro : Unifying Dynamic Impact Analyses for Improved and Variable Cost-Effectiveness

Haipeng Cai; Raúl A. Santelices; Douglas Thain

Impact analysis not only assists developers with change planning and management, but also facilitates a range of other client analyses, such as testing and debugging. In particular, for developers working in the context of specific program executions, dynamic impact analysis is usually more desirable than static approaches, as it produces more manageable and relevant results with respect to those concrete executions. However, existing techniques for this analysis mostly lie on two extremes: either fast, but too imprecise, or more precise, yet overly expensive. In practice, both more cost-effective techniques and variable cost-effectiveness trade-offs are in demand to fit a variety of usage scenarios and budgets of impact analysis. This article aims to fill the gap between these two extremes with an array of cost-effective analyses and, more broadly, to explore the cost and effectiveness dimensions in the design space of impact analysis. We present the development and evaluation of DiaPro, a framework that unifies a series of impact analyses, including three new hybrid techniques that combine static and dynamic analyses. Harnessing both static dependencies and multiple forms of dynamic data including method-execution events, statement coverage, and dynamic points-to sets, DiaPro prunes false-positive impacts with varying strength for variant effectiveness and overheads. The framework also facilitates an in-depth examination of the effects of various program information on the cost-effectiveness of impact analysis. We applied DiaPro to ten Java applications in diverse scales and domains, evaluating it thoroughly on both arbitrary and repository-based queries from those applications. We show that the three new analyses are all significantly more effective than existing alternatives while remaining efficient, and the DiaPro framework, as a whole, provides flexible cost-effectiveness choices for impact analysis with the best options for variable needs and budgets. Our study results also suggest that hybrid techniques tend to be much more cost-effective than purely dynamic approaches, in general, and that statement coverage has mostly stronger effects than dynamic points-to sets on the cost-effectiveness of dynamic impact analysis, while static dependencies have even stronger effects than both forms of dynamic data.

Explore More