Carolyn Mair
Southampton Solent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carolyn Mair.
Journal of Systems and Software | 2000
Carolyn Mair; Gada F. Kadoda; Martin Lefley; Keith Phalp; Chris Schofield; Martin J. Shepperd; Steve Webster
Traditionally, researchers have used either o�f-the-shelf models such as COCOMO, or developed local models using statistical techniques such as stepwise regression, to obtain software eff�ort estimates. More recently, attention has turned to a variety of machine learning methods such as artifcial neural networks (ANNs), case-based reasoning (CBR) and rule induction (RI). This paper outlines some comparative research into the use of these three machine learning methods to build software e�ort prediction systems. We briefly describe each method and then apply the techniques to a dataset of 81 software projects derived from a Canadian software house in the late 1980s. We compare the prediction systems in terms of three factors: accuracy, explanatory value and configurability. We show that ANN methods have superior accuracy and that RI methods are least accurate. However, this view is somewhat counteracted by problems with explanatory value and configurability. For example, we found that considerable eff�ort was required to configure the ANN and that this compared very unfavourably with the other techniques, particularly CBR and least squares regression (LSR). We suggest that further work be carried out, both to further explore interaction between the enduser and the prediction system, and also to facilitate configuration, particularly of ANNs.
IEEE Transactions on Software Engineering | 2006
Qinbao Song; Martin J. Shepperd; Michelle Cartwright; Carolyn Mair
Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that, for defect association prediction, the accuracy is very high and the false-negative rate is very low. Likewise, for the defect correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect correction effort prediction method with other types of methods - PART, C4.5, and Naive Bayes - and show that accuracy has been improved by at least 23 percent. We also evaluated the impact of support and confidence levels on prediction accuracy, false-negative rate, false-positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy.
IEEE Transactions on Software Engineering | 2013
Martin J. Shepperd; Qinbao Song; Zhongbin Sun; Carolyn Mair
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results--We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions--It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners.
ieee international software metrics symposium | 2005
Qinbao Song; Martin Shepperd; Carolyn Mair
The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly focus on feature subset selection and effort prediction at an early stage of a project. We propose a novel approach of using grey relational analysis (GRA) of grey system theory (GST), which is a recently developed system engineering theory based on the uncertainty of small samples. In this work we address some of the theoretical challenges in applying GRA to feature subset selection and effort prediction, and then evaluate our approach on five publicly available industrial data sets using stepwise regression as a benchmark. The results are very encouraging in the sense of being comparable or better than other machine learning techniques and thus indicate that the method has considerable potential
model driven engineering languages and systems | 2005
Carolyn Mair; Martin J. Shepperd; Magne Jørgensen
OBJECTIVE - to build up a picture of the nature and type of data sets being used to develop and evaluate different software project effort prediction systems. We believe this to be important since there is a growing body of published work that seeks to assess different prediction approaches.METHOD - we performed an exhaustive search from 1980 onwards from three software engineering journals for research papers that used project data sets to compare cost prediction systems.RESULTS - this identified a total of 50 papers that used, one or more times, a total of 71 unique project data sets. We observed that some of the better known and easily accessible data sets were used repeatedly making them potentially disproportionately influential. Such data sets also tend to be amongst the oldest with potential problems of obsolescence. We also note that only about 60% of all data sets are in the public domain. Finally, extracting relevant information from research papers has been time consuming due to different styles of presentation and levels of contextural information.CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need to assess the way results are presented in order to facilitate meta-analysis and whether a standard protocol would be appropriate.
Journal of Further and Higher Education | 2012
Carolyn Mair
There exists broad agreement on the value of reflective practice for personal and professional development. However, many students in higher education (HE) struggle with the concept of reflection, so they do not engage well with the process, and its full value is seldom realised. An online resource was developed to facilitate and structure the recording, storage and retrieval of reflections with the focus on facilitating reflective writing, developing metacognitive awareness and, ultimately, enhancing learning. Ten undergraduate students completed a semi-structured questionnaire prior to participating in a focus group designed to elicit a common understanding of reflective practice. They maintained reflective practice online for 6 weeks and participated in post-study individual interviews. Findings provide evidence for the positive acceptance, efficiency and effectiveness of the intervention. Using a structured approach to online reflective practice is empowering and ultimately enhances undergraduate learning through the development of metacognition.
hawaii international conference on system sciences | 2012
Carolyn Mair; Miriam Martincova; Martin J. Shepperd
BACKGROUND -- whilst substantial effort has been invested in developing and evaluating knowledge-based techniques for project prediction, little is known about the interaction between them and expert users. OBJECTIVE - the aim is to explore the interaction of cognitive processes and personality of software project managers undertaking tool-supported estimation tasks such as effort and cost prediction. METHOD - we conducted personality profiling and observational studies using think-aloud protocols with five senior project managers using a case-based reasoning (CBR) tool to predict effort for real projects. RESULTS - we found pronounced differences between the participants in terms of individual differences, cognitive behaviour and estimation outcomes, although there was a general tendency for over-optimism and over-confidence. CONCLUSIONS - in order to improve task effectiveness in the workplace we need to understand the cognitive behaviour of software professionals in addition to conducting machine learning research.
Journal of Applied Research in Higher Education | 2016
Lalage Sanders; Carolyn Mair; Rachael James
Purpose – The purpose of this paper is to evaluate the use of two psychometric measures as predictors of end of year outcome for first year university students. Design/methodology/approach – New undergraduates (n=537) were recruited in two contrasting universities: one arts based, and one science, in different cities in the UK. At the start of the academic year, new undergraduates across 30 programmes in the two institutions were invited to complete a survey comprising two psychometric measures: Academic Behavioural Confidence scale and the Performance Expectation Ladder. Outcome data were collected from the examining boards the following summer distinguishing those who were able to progress to the next year of study without further assessment from those who were not. Findings – Two of the four Confidence subscales, Attendance and Studying, had significantly lower scores amongst students who were not able to progress the following June compared to those who did (p < 0.003). The Ladder data showed the less...
international symposium on empirical software engineering | 2005
Carolyn Mair; Martin J. Shepperd
workshop on emerging trends in software metrics | 2011
Carolyn Mair; Martin J. Shepperd