Michelle Cartwright | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michelle Cartwright is active.

Explore More

Publication

Featured researches published by Michelle Cartwright.

IEEE Transactions on Software Engineering | 2000

An empirical investigation of an object-oriented software system

Michelle Cartwright; Martin J. Shepperd

The paper describes an empirical investigation into an industrial object oriented (OO) system comprised of 133000 lines of C++. The system was a subsystem of a telecommunications product and was developed using the Shlaer-Mellor method (S. Shlaer and S.J. Mellor, 1988; 1992). From this study, we found that there was little use of OO constructs such as inheritance, and therefore polymorphism. It was also found that there was a significant difference in the defect densities between those classes that participated in inheritance structures and those that did not, with the former being approximately three times more defect-prone. We were able to construct useful prediction systems for size and number of defects based upon simple counts such as the number of states and events per class. Although these prediction systems are only likely to have local significance, there is a more general principle that software developers can consider building their own local prediction systems. Moreover, we believe this is possible, even in the absence of the suites of metrics that have been advocated by researchers into OO technology. As a consequence, measurement technology may be accessible to a wider group of potential users.

IEEE Transactions on Software Engineering | 2006

Software defect association mining and defect correction effort prediction

Qinbao Song; Martin J. Shepperd; Michelle Cartwright; Carolyn Mair

Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that, for defect association prediction, the accuracy is very high and the false-negative rate is very low. Likewise, for the defect correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect correction effort prediction method with other types of methods - PART, C4.5, and Naive Bayes - and show that accuracy has been improved by at least 23 percent. We also evaluated the impact of support and confidence levels on prediction accuracy, false-negative rate, false-positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy.

Information & Software Technology | 1998

An empirical view of inheritance

Michelle Cartwright

For some years software engineering researchers have been advocating object-oriented (OO) methods as a powerful approach to overcome many of the difficulties associated with software development. A central concept within OO is the use of the inheritance mechanism, primarily with a view to enhancing opportunities for reuse. This paper reviews the evidence we have concerning the use and the impact of inheritance within OO software. A striking result is how little the mechanism is used in practice. The paper then goes onto describe an experiment conducted at Bournemouth University to investigate the impact of class inheritance upon the maintenance of C++ software. The experiment was a replication of an experiment previously conducted at the University of Strathclyde. Two versions of a bibliographic database program were used. One used inheritance and the other a flat structure. It was found that subjects took significantly longer to make the same change on the program with inheritance, however, their changes were more compact. Interestingly, the Strathclyde group found that the time to make changes for the inheritance version was reduced. We conclude that there is an urgent need for further empirical work to study the impact of the inheritance mechanism using more subjects and in an industrial setting.

ieee international software metrics symposium | 2003

Dealing with missing software project data

Michelle Cartwright; Martin J. Shepperd; Qinbao Song

Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. We examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared. In both data sets we find that k-nearest neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.

Empirical Software Engineering | 2000

On Building Prediction Systems for Software Engineers

Martin J. Shepperd; Michelle Cartwright; Gada F. Kadoda

Building and evaluating predictionsystems is an important activity for software engineering researchers.Increasing numbers of techniques and datasets are now being madeavailable. Unfortunately systematic comparison is hindered bythe use of different accuracy indicators and evaluation processes.We argue that these indicators are statistics that describe propertiesof the estimation errors or residuals and that the sensible choiceof indicator is largely governed by the goals of the estimator.For this reason it may be helpful for researchers to providea range of indicators. We also argue that it is useful to formallytest for significant differences between competing predictionsystems and note that where only a few cases are available thiscan be problematic, in other words the research instrument mayhave insufficient power. We demonstrate that this is the casefor a well known empirical study of cost models. Simulation,however, could be one means of overcoming this difficulty.

Empirical Software Engineering | 2005

A Short Note on Safest Default Missingness Mechanism Assumptions

Qinbao Song; Martin J. Shepperd; Michelle Cartwright

A very common problem when building software engineering models is dealing with missing data. To address this there exist a range of imputation techniques. However, selecting the appropriate imputation technique can also be a difficult problem. One reason for this is that these techniques make assumptions about the underlying missingness mechanism, that is how the missing values are distributed within the data set. It is compounded by the fact that, for small data sets, it may be very difficult to determine what is the missingness mechanism. This means there is a danger of using an inappropriate imputation technique. Therefore, it is necessary to determine what is the safest default assumption about the missingness mechanism for imputation techniques when dealing with small data sets. We examine experimentally, two simple and commonly used techniques: Class Mean Imputation (CMI) and k Nearest Neighbors (k-NN) coupled with two missingness mechanisms: missing completely at random (MCAR) and missing at random (MAR). We draw two conclusions. First, that for our analysis CMI is the preferred technique since it is more accurate. Second, and more importantly, the impact of missingness mechanism on imputation accuracy is not statistically significant. This is a useful finding since it suggests that even for small data sets we can reasonably make a weaker assumption that the missingness mechanism is MAR. Thus both imputation techniques have practical application for small software engineering data sets with missing values.

international conference on case based reasoning | 2001

Issues on the Effective Use of CBR Technology for Software Project Prediction

Gada F. Kadoda; Michelle Cartwright; Martin J. Shepperd

This paper explores some of the practical issues associated with the use of case-based reasoning (CBR) or estimation by analogy for software project effort prediction. Different research teams have reported varying experiences with this technology. We take the view that the problems hindering the effective use of CBR technology are twofold. First, the underlying characteristics of the datasets play a major role in determining which prediction technique is likely to be most effective. Second, when CBR is that technique, we find that configuring a CBR system can also have a significant impact upon predictive capabilities. In this paper we examine the performance of CBR when applied to various datasets using stepwise regression (SWR) as a benchmark. We also explore the impact of the choice of number of analogies and the size of the training dataset when making predictions.

international conference on software engineering | 2006

Ensemble of missing data techniques to improve software prediction accuracy

Bhekisipho Twala; Michelle Cartwright; Martin J. Shepperd

Software engineers are commonly faced with the problem of incomplete data. Incomplete data can reduce system performance in terms of predictive accuracy. Unfortunately, rare research has been conducted to systematically explore the impact of missing values, especially from the missing data handling point of view. This has made various missing data techniques (MDTs) less significant. This paper describes a systematic comparison of seven MDTs using eight industrial datasets. Our findings from an empirical evaluation suggest listwise deletion as the least effective technique for handling incomplete data while multiple imputation achieves the highest accuracy rates. We further propose and show how a combination of MDTs by randomizing a decision tree building algorithm leads to a significant improvement in prediction performance for missing values up to 50%.

international symposium on empirical software engineering | 2005

Comparison of various methods for handling incomplete data in software engineering databases

Bhekisipho Twala; Michelle Cartwright; Martin J. Shepperd

Increasing the awareness of how missing data affects software predictive accuracy has led to increasing numbers of missing data techniques (MDTs). This paper investigates the robustness and accuracy of eight popular techniques for tolerating incomplete training and test data using tree-based models. MDTs were compared by artificially simulating different proportions, patterns, and mechanisms of missing data. A 4-way repeated measures design was employed to analyze the data. The simulation results suggest important differences. Listwise deletion is substantially inferior while multiple imputation (MI) represents a superior approach to handling missing data. Decision tree single imputation and surrogate variables splitting are more severely impacted by missing values distributed among all attributes. MI should be used if the data contain many missing values. If few values are missing, any of the MDTs might be considered. Choice of technique should be guided by pattern and mechanisms of missing data.

ieee international software metrics symposium | 2001

Predicting with sparse data

Martin J. Shepperd; Michelle Cartwright

It is well known that effective prediction of project cost related factors is an important aspect of software engineering. Unfortunately, despite extensive research over more than 30 years, this remains a significant problem for many practitioners. A major obstacle is the absence of reliable and systematic historic data, yet this is a sine qua non for almost all proposed methods: statistical, machine learning or calibration of existing models. The authors describe their sparse data method (SDM) based upon a pairwise comparison technique and T.L. Saatys (1980) Analytic Hierarchy Process. Our minimum data requirement is a single known point. The technique is supported by a software tool known as DataSalvage. We show, for data from two companies, how our approach, based upon expert judgement, adds value to expert judgement by producing significantly more accurate and less biased results. A sensitivity analysis shows that our approach is robust to pairwise comparison errors. We then describe the results of a small usability trial with a practising project manager. From this empirical work we conclude that the technique is promising and may help overcome some of the present barriers to effective project prediction.

Explore More