Rongxin Wu
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rongxin Wu.
foundations of software engineering | 2011
Rongxin Wu; Hongyu Zhang; Sunghun Kim; Shing Chi Cheung
Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics.
international conference on software engineering | 2011
Sunghun Kim; Hongyu Zhang; Rongxin Wu; Liang Gong
Many software defect prediction models have been built using historical defect data obtained by mining software repositories (MSR). Recent studies have discovered that data so collected contain noises because current defect collection practices are based on optional bug fix keywords or bug report links in change logs. Automatically collected defect data based on the change logs could include noises. This paper proposes approaches to deal with the noise in defect data. First, we measure the impact of noise on defect prediction models and provide guidelines for acceptable noise level. We measure noise resistant ability of two well-known defect prediction algorithms and find that in general, for large defect datasets, adding FP (false positive) or FN (false negative) noises alone does not lead to substantial performance differences. However, the prediction performance decreases significantly when the dataset contains 20%-35% of both FP and FN noises. Second, we propose a noise detection and elimination algorithm to address this problem. Our empirical study shows that our algorithm can identify noisy instances with reasonable accuracy. In addition, after eliminating the noises using our algorithm, defect prediction accuracy is improved.
international conference on software engineering | 2012
Yingnong Dang; Rongxin Wu; Hongyu Zhang; Dongmei Zhang; Peter Nobel
Software often crashes. Once a crash happens, a crash report could be sent to software developers for investigation upon user permission. To facilitate efficient handling of crashes, crash reports received by Microsofts Windows Error Reporting (WER) system are organized into a set of “buckets”. Each bucket contains duplicate crash reports that are deemed as manifestations of the same bug. The bucket information is important for prioritizing efforts to resolve crashing bugs. To improve the accuracy of bucketing, we propose ReBucket, a method for clustering crash reports based on call stack matching. ReBucket measures the similarities of call stacks in crash reports and then assigns the reports to appropriate buckets based on the similarity values. We evaluate ReBucket using crash data collected from five widely-used Microsoft products. The results show that ReBucket achieves better overall performance than the existing methods. On average, the F-measure obtained by ReBucket is about 0.88.
automated software engineering | 2012
Ming Li; Hongyu Zhang; Rongxin Wu; Zhi-Hua Zhou
Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.
international symposium on software testing and analysis | 2014
Rongxin Wu; Hongyu Zhang; Shing Chi Cheung; Sunghun Kim
Software crash is common. When a crash occurs, software developers can receive a report upon user permission. A crash report typically includes a call stack at the time of crash. An important step of debugging a crash is to identify faulty functions, which is often a tedious and labor-intensive task. In this paper, we propose CrashLocator, a method to locate faulty functions using the crash stack information in crash reports. It deduces possible crash traces (the failing execution traces that lead to crash) by expanding the crash stack with functions in static call graph. It then calculates the suspiciousness of each function in the approximate crash traces. The functions are then ranked by their suspiciousness scores and are recommended to developers for further investigation. We evaluate our approach using real-world Mozilla crash data. The results show that our approach is effective: we can locate 50.6%, 63.7% and 67.5% of crashing faults by examining top 1, 5 and 10 functions recommended by CrashLocator, respectively. Our approach outperforms the conventional stack-only methods significantly.
automated software engineering | 2016
Ming Wen; Rongxin Wu; Shing Chi Cheung
Various information retrieval (IR) based techniques have been proposed recently to locate bugs automatically at the file level. However, their usefulness is often compromised by the coarse granularity of files and the lack of contextual information. To address this, we propose to locate bugs using software changes, which offer finer granularity than files and provide important contextual clues for bug-fixing. We observe that bug inducing changes can facilitate the bug fixing process. For example, it helps triage the bug fixing task to the developers who committed the bug inducing changes or enables developers to fix bugs by reverting these changes. Our study further identifies that change logs and the naturally small granularity of changes can help boost the performance of IR-based bug localization. Motivated by these observations, we propose an IR-based approach Locus to locate bugs from software changes, and evaluate it on six large open source projects. The results show that Locus outperforms existing techniques at the source file level localization significantly. MAP and MRR in particular have been improved, on average, by 20.1% and 20.5%, respectively. Locus is also capable of locating the inducing changes within top 5 for 41.0% of the bugs. The results show that Locus can significantly reduce the number of lines needing to be scanned to locate the bug compared with existing techniques.
symposium on principles of programming languages | 2016
Rongxin Wu; Xiao Xiao; Shing Chi Cheung; Hongyu Zhang; Charles Zhang
Call traces, i.e., sequences of function calls and returns, are fundamental to a wide range of program analyses such as bug reproduction, fault diagnosis, performance analysis, and many others. The conventional approach to collect call traces that instruments each function call and return site incurs large space and time overhead. Our approach aims at reducing the recording overheads by instrumenting only a small amount of call sites while keeping the capability of recovering the full trace. We propose a call trace model and a logged call trace model based on an LL(1) grammar, which enables us to define the criteria of a feasible solution to call trace collection. Based on the two models, we prove that to collect call traces with minimal instrumentation is an NP-hard problem. We then propose an efficient approach to obtaining a suboptimal solution. We implemented our approach as a tool Casper and evaluated it using the DaCapo benchmark suite. The experiment results show that our approach causes significantly lower runtime (and space) overhead than two state-of-the-arts approaches.
international conference on software maintenance | 2010
Hongyu Zhang; Rongxin Wu
Many modern software systems are large, consisting of hundreds or even thousands of programs (source files). Understanding the overall quality of these programs is a resource and time-consuming activity. It is desirable to have a quick yet accurate estimation of the overall program quality in a cost-effective manner. In this paper, we propose a sampling based approach - for a large software project, we only sample a small percentage of source files, and then estimate the quality of the entire programs in the project based on the characteristics of the sample. Through experiments on public defect datasets, we show that we can successfully estimate the total number of defects, proportions of defective programs, defect distributions, and defect-proneness - all from a small sample of programs. Our experiments also show that small samples can achieve similar prediction accuracies as larger samples do.
foundations of software engineering | 2014
Rongxin Wu
Software crashes are severe manifestations of software faults. Especially, software crashes in production software usually result in bad user experiences. Therefore, crashing faults mostly are required to be fixed with a high priority. Diagnosing crashing faults on production software is non-trivial, due to the characteristics of production environment. In general, it is required to address two major challenges. First, crash reports in production software are usually numerous, since production software is used by a large number of end users in various environments and configurations. Especially, a single fault may manifest as different crash reports, which makes the prioritizing debugging and understanding faults difficult. Second, deployed software is required to run with minimal overhead and cannot afford a heavyweight instrumentation approach to collect program execution information. Furthermore, end users require that the logged information should not reveal sensitive production data. This thesis contributes for developing crashing fault diagnosis tools that can be used in production environment.
Empirical Software Engineering | 2018
Rongxin Wu; Ming Wen; Shing Chi Cheung; Hongyu Zhang
Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.