David Bowes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Bowes is active.

Explore More

Publication

Featured researches published by David Bowes.

IEEE Transactions on Software Engineering | 2012

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Tracy Hall; Sarah Beecham; David Bowes; David Gray; Steven Counsell

Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.

IEEE Transactions on Software Engineering | 2014

Researcher Bias: The Use of Machine Learning in Software Defect Prediction

Martin J. Shepperd; David Bowes; Tracy Hall

Background. The ability to predict defect-prone software components would be valuable. Consequently, there have been many empirical studies to evaluate the performance of different techniques endeavouring to accomplish this effectively. However no one technique dominates and so designing a reliable defect prediction model remains problematic. Objective. We seek to make sense of the many conflicting experimental results and understand which factors have the largest effect on predictive performance. Method. We conduct a meta-analysis of all relevant, high quality primary studies of defect prediction to determine what factors influence predictive performance. This is based on 42 primary studies that satisfy our inclusion criteria that collectively report 600 sets of empirical prediction results. By reverse engineering a common response variable we build a random effects ANOVA model to examine the relative contribution of four model building factors (classifier, data set, input metrics and researcher group) to model prediction performance. Results. Surprisingly we find that the choice of classifier has little impact upon performance (1.3 percent) and in contrast the major (31 percent) explanatory factor is the researcher group. It matters more who does the work than what is done. Conclusion. To overcome this high level of researcher bias, defect prediction researchers should (i) conduct blind analysis, (ii) improve reporting protocols and (iii) conduct more intergroup studies in order to alleviate expertise issues. Lastly, research is required to determine whether this bias is prevalent in other applications domains.

international conference on software engineering | 2014

Some Code Smells Have a Significant but Small Effect on Faults

Tracy Hall; Min Zhang; David Bowes; Yi Sun

We investigate the relationship between faults and five of Fowler et al.s least-studied smells in code: Data Clumps, Switch Statements, Speculative Generality, Message Chains, and Middle Man. We developed a tool to detect these five smells in three open-source systems: Eclipse, ArgoUML, and Apache Commons. We collected fault data from the change and fault repositories of each system. We built Negative Binomial regression models to analyse the relationships between smells and faults and report the McFadden effect size of those relationships. Our results suggest that Switch Statements had no effect on faults in any of the three systems; Message Chains increased faults in two systems; Message Chains which occurred in larger files reduced faults; Data Clumps reduced faults in Apache and Eclipse but increased faults in ArgoUML; Middle Man reduced faults only in ArgoUML, and Speculative Generality reduced faults only in Eclipse. File size alone affects faults in some systems but not in all systems. Where smells did significantly affect faults, the size of that effect was small (always under 10 percent). Our findings suggest that some smells do indicate fault-prone code in some circumstances but that the effect that these smells have on faults is small. Our findings also show that smells have different effects on different systems. We conclude that arbitrary refactoring is unlikely to significantly reduce fault-proneness and in some cases may increase fault-proneness.

international conference on engineering applications of neural networks | 2009

Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics

David Gray; David Bowes; Neil Davey; Yi Sun; Bruce Christianson

The automated detection of defective modules within software systems could lead to reduced development costs and more reliable software. In this work the static code metrics for a collection of modules contained within eleven NASA data sets are used with a Support Vector Machine classifier. A rigorous sequence of pre-processing steps were applied to the data prior to classification, including the balancing of both classes (defective or otherwise) and the removal of a large number of repeating instances. The Support Vector Machine in this experiment yields an average accuracy of 70% on previously unseen data.

IET Software | 2012

Reflections on the NASA MDP data sets

David Gray; David Bowes; Neil Davey; Yi Sun; Bruce Christianson

Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.

Proceedings of the 2nd international workshop on Evidential assessment of software technologies | 2012

SLuRp: a tool to help large complex systematic literature reviews deliver valid and rigorous results

David Bowes; Tracy Hall; Sarah Beecham

Background: Systematic literature reviews are increasingly used in software engineering. Most systematic literature reviews require several hundred papers to be examined and assessed. This is not a trivial task and can be time consuming and error-prone. Aim: We present SLuRp - our open source web enabled database that supports the management of systematic literature reviews. Method: We describe the functionality of SLuRp and explain how it supports all phases in a systematic literature review. Results: We show how we used SLuRp in our SLR. We discuss how SLuRp enabled us to generate complex results in which we had confidence. Conclusions: SLuRp supports all phases of an SLR and enables reliable results to be generated. If we are to have confidence in the outcomes of SLRs it is essential that such automated systems are used.

evaluation and assessment in software engineering | 2016

The jinx on the NASA software defect data sets

Jean Petrić; David Bowes; Tracy Hall; Bruce Christianson; Nathan Baddoo

Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use. Objective: We have now found additional rules necessary for removing problematic data which were not identified by Shepperd et al. Results: In this paper, we demonstrate the level of erroneous data still present even after cleaning using Shepperd et al.s rules and apply our new rules to remove this erroneous data. Conclusion: Even after systematic data cleaning of the NASA MDP datasets, we found new erroneous data. Data quality should always be explicitly considered by researchers before use.

software engineering and advanced applications | 2009

Fault Analysis in OSS Based on Program Slicing Metrics

Sue E. Black; Steve Counsell; Tracy Hall; David Bowes

In this paper, we investigate the Barcode OSS using two of Weiser’s original slice-based metrics (Tightness and Overlap) as a basis, complemented with fault data extracted from multiple versions of the same system. We compared the values of the metrics in functions with at least one reported fault with fault-free modules to determine a) whether significant differences in the two metrics would be observed and b) whether those metrics might allow prediction of faulty functions. Results revealed some interesting traits of the Tightness metric and, in particular, how low values of that metric seemed to indicate fault-prone functions. A significant difference was found between the Tightness metric values for faulty functions when compared to fault-free functions suggesting that Tightness is the ‘better’ of the two metrics in this sense. The Overlap metric seemed less sensitive to differences between the two types of function.

predictive models in software engineering | 2012

Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

David Bowes; Tracy Hall; David Gray

There are many hundreds of fault prediction models published in the literature. The predictive performance of these models is often reported using a variety of different measures. Most performance measures are not directly comparable. This lack of comparability means that it is often difficult to evaluate the performance of one model against another. Our aim is to present an approach that allows other researchers and practitioners to transform many performance measures of categorical studies back into a confusion matrix. Once performance is expressed in a confusion matrix alternative preferred performance measures can then be derived. Our approach has enabled us to compare the performance of 600 models published in 42 studies. We demonstrate the application of our approach on several case studies, and discuss the advantages and implications of doing this.

international conference on machine learning and applications | 2012

The State of Machine Learning Methodology in Software Fault Prediction

Tracy Hall; David Bowes

The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.

Explore More