Nikolaos Mittas
Aristotle University of Thessaloniki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nikolaos Mittas.
IEEE Transactions on Software Engineering | 2013
Nikolaos Mittas; Lefteris Angelis
Software Cost Estimation can be described as the process of predicting the most realistic effort required to complete a software project. Due to the strong relationship of accurate effort estimations with many crucial project management activities, the research community has been focused on the development and application of a vast variety of methods and models trying to improve the estimation procedure. From the diversity of methods emerged the need for comparisons to determine the best model. However, the inconsistent results brought to light significant doubts and uncertainty about the appropriateness of the comparison process in experimental studies. Overall, there exist several potential sources of bias that have to be considered in order to reinforce the confidence of experiments. In this paper, we propose a statistical framework based on a multiple comparisons algorithm in order to rank several cost estimation models, identifying those which have significant differences in accuracy, and clustering them in nonoverlapping groups. The proposed framework is applied in a large-scale setup of comparing 11 prediction models over six datasets. The results illustrate the benefits and the significant information obtained through the systematic comparison of alternative methods.
Information & Software Technology | 2008
Nikolaos Mittas; Marinos Athanasiades; Lefteris Angelis
Estimation by analogy (EbA) is a well-known technique for software cost estimation. The popularity of the method is due to its straightforwardness and its intuitively appealing interpretation. However, in spite of the simplicity in application, the theoretical study of EbA is quite complicated. In this paper, we exploit the relation of EbA method to the nearest neighbor non-parametric regression in order to suggest a resampling procedure, known as iterated bagging, for reducing the prediction error. The improving effect of iterated bagging on EbA is validated using both artificial and real datasets from the literature, obtaining very promising results.
Journal of Systems and Software | 2008
Nikolaos Mittas; Lefteris Angelis
The accurate software cost prediction is a research topic that has attracted much of the interest of the software engineering community during the latest decades. A large part of the research efforts involves the development of statistical models based on historical data. Since there are a lot of models that can be fitted to certain data, a crucial issue is the selection of the most efficient prediction model. Most often this selection is based on comparisons of various accuracy measures that are functions of the models relative errors. However, the usual practice is to consider as the most accurate prediction model the one providing the best accuracy measure without testing if this superiority is in fact statistically significant. This policy can lead to unstable and erroneous conclusions since a small change in the data is able to turn over the best model selection. On the other hand, the accuracy measures used in practice are statistics with unknown probability distributions, making the testing of any hypothesis, by the traditional parametric methods, problematic. In this paper, the use of statistical simulation tools is proposed in order to test the significance of the difference between the accuracy of two prediction methods: regression and estimation by analogy. The statistical simulation procedures involve permutation tests and bootstrap techniques for the construction of confidence intervals for the difference of measures. Four known datasets are used for experimentation in order to validate the results and make comparisons between the simulation methods and the traditional parametric and non-parametric procedures.
Empirical Software Engineering | 2010
Nikolaos Mittas; Lefteris Angelis
The importance of Software Cost Estimation at the early stages of the development life cycle is clearly portrayed by the utilization of several models and methods, appeared so far in the literature. The researchers’ interest has been focused on two well known techniques, namely the parametric Regression Analysis and the non-parametric Estimation by Analogy. Despite the several comparison studies, there seems to be a discrepancy in choosing the best prediction technique between them. In this paper, we introduce a semi-parametric technique, called LSEbA that achieves to combine the aforementioned methods retaining the advantages of both approaches. Furthermore, the proposed method is consistent with the mixed nature of Software Cost Estimation data and takes advantage of the whole pure information of the dataset even if there is a large amount of missing values. The paper analytically illustrates the process of building such a model and presents the experimentation on three representative datasets verifying the benefits of the proposed model in terms of accuracy, bias and spread. Comparisons of LSEbA with linear regression, estimation by analogy and a combination of them, based on the average of their outcomes are made through accuracy metrics, statistical tests and a graphical tool, the Regression Error Characteristic curves.
empirical software engineering and measurement | 2013
Damir Azhar; Patricia Riddle; Emilia Mendes; Nikolaos Mittas; Lefteris Angelis
Background: Despite the number of Web effort estimation techniques investigated, there is no consensus as to which technique produces the most accurate estimates, an issue shared by effort estimation in the general software estimation domain. A previous study in this domain has shown that using ensembles of estimation techniques can be used to address this issue. Aim: The aim of this paper is to investigate whether ensembles of effort estimation techniques will be similarly successful when used on Web project data. Method: The previous study built ensembles using solo effort estimation techniques that were deemed superior. In order to identify these superior techniques two approaches were investigated: The first involved replicating the methodology used in the previous study, while the second approach used the Scott-Knott algorithm. Both approaches were done using the same 90 solo estimation techniques on Web project data from the Tukutuku dataset. The replication identified 16 solo techniques that were deemed superior and were used to build 15 ensembles, while the Scott-Knott algorithm identified 19 superior solo techniques that were used to build two ensembles. Results: The ensembles produced by both approaches performed very well against solo effort estimation techniques. With the replication, the top 12 techniques were all ensembles, with the remaining 3 ensembles falling within the top 17 techniques. These 15 effort estimation ensembles, along with the 2 built by the second approach, were grouped into the best cluster of effort estimation techniques by the Scott-Knott algorithm. Conclusion: While it may not be possible to identify a single best technique, the results suggest that ensembles of estimation techniques consistently perform well even when using Web project data.
Journal of Systems and Software | 2010
Nikolaos Mittas; Lefteris Angelis
The well-balanced management of a software project is a critical task accomplished at the early stages of the development process. Due to this requirement, a wide variety of prediction methods has been introduced in order to identify the best strategy for software cost estimation. The selection of the best technique is usually based on measures of error whereas in more recent studies researchers use formal statistical procedures. The former approach can lead to unstable and erroneous results due to the existence of outlying points whereas the latter cannot be easily presented to non-experts and has to be carried out by an expert with statistical background. In this paper, we introduce the regression error characteristic (REC) analysis, a powerful visualization tool with interesting geometrical properties, in order to validate and compare different prediction models easily, by a simple inspection of a graph. Moreover, we propose a formal framework covering different aspects of the estimation process such as the calibration of the prediction methodology, the identification of factors that affect the error, the investigation of errors on certain ranges of the actual cost and the examination of the distribution of the cost for certain errors. Application of REC analysis to the ISBSG10 dataset for comparing estimation by analogy and linear regression illustrates the benefits and the significant information obtained.
empirical software engineering and measurement | 2008
Nikolaos Mittas; Lefteris Angelis
Software Cost Estimation is the task of predicting the effort or productivity required to complete a software project. Two of the most known techniques appeared in literature so far are Regression Analysis and Estimation by Analogy. The results of the empirical studies show the lack of convergence in choosing the best prediction technique between the parametric Regression Analysis and the non-parametric Estimation by Analogy models. In this paper, we introduce the use of a semi-parametric model that achieves to incorporate some parametric information into a non-parametric model combining in this way regression and analogy. Furthermore, we demonstrate the procedure of building such a model on two well-known datasets and we present the comparative results based on the predictive accuracy of the new technique using several accuracy measures. We also perform statistical tests on the residuals in order to assess the improvement in the predictions attained through the new semi-parametric model in comparison to the accuracy of Regression Analysis and Estimation by Analogy when applied separately. Our results show that the semi-parametric model provides more accurate predictions than each one of the parametric and non-parametric approaches.
Information & Software Technology | 2015
Nikolaos Mittas; Ioannis Mamalikidis; Lefteris Angelis
Abstract Context The importance of accurate predictions in Software Cost Estimation and the related challenging research problems, led to the introduction of a plethora of methodologies in literature. However, the wide variety of cost estimation methods, the techniques for improving them and the different measures of accuracy have caused new problems such as the inconsistent findings and the conclusion instability. Today, there is a confusion regarding the choice of the most appropriate method for a specific dataset and therefore a need for well-established statistical frameworks as well as for automated tools that will reinforce and lead a comprehensive experimentation and comparison process, based on the thorough study of the cost estimation errors. Objective The purpose of this paper is to present a framework for visualization and statistical comparison of the errors of several cost estimation methods. It is based on an automated tool which can facilitate strategies for an intelligent decision-making. Method A systematic procedure comprised of a series of steps corresponding to research questions is proposed. For each of the steps, StatREC, a Graphical User Interface statistical toolkit is utilized. StatREC was designed and developed to take as input a simple data matrix of predictions by multiple models and to provide a variety of graphical tools and statistical hypothesis tests for aiding the users to answer the questions and choose the appropriate model themselves. Results The study of prediction errors by the proposed framework provides insight of several aspects related to prediction performance of different models. The systematic examination of candidate models by a series of research questions supports the user to make the final decision. Conclusion Structured procedures based on automated tools like StatREC can efficiently be used for studying the error and comparing cost estimation models.
Empirical Software Engineering | 2012
Nikolaos Mittas; Lefteris Angelis
BackgroundRegression Error Characteristic (REC) curves provide a visualization tool, able to characterize graphically the prediction power of alternative predictive models. Due to the benefits of using such a visualization description of the whole distribution of error, REC analysis was recently introduced in software cost estimation to aid the decision of choosing the most appropriate cost estimation model during the management of a forthcoming project.AimsAlthough significant information can be retrieved from a readable graph, REC curves are not able to assess whether the divergences between the alternative error functions can constitute evidence for a statistically significant difference.MethodIn this paper, we propose a graphical procedure that utilizes (a) the process of repetitive permutations and (b) and the maximum vertical deviation between two comparative Regression Error Characteristic curves in order to conduct a hypothesis test for assessing the statistical significance of error functions.ResultsIn our case studies, the data used come from software projects and the models compared are cost prediction models. The results clearly showed that the proposed statistical test is necessary in order to assess the significance of the superiority of a prediction model, since it provides an objective criterion for the distances between the REC curves. Moreover, the procedure can be easily applied to any dataset where the objective is the prediction of a response variable of interest and the comparison of alternative prediction techniques in order to select the best strategy.ConclusionsThe proposed hypothesis test, accompanying an informative graphical tool, is more easily interpretable than the conventional parametric and non-parametric statistical procedures. Moreover, it is free from normality assumptions of the error distributions when the samples are small-sized and highly skewed. Finally, the proposed graphical test can be applied to the comparisons of any alternative prediction methods and models and also to any other validation procedure.
software engineering and advanced applications | 2008
Nikolaos Mittas; Lefteris Angelis
A crucial issue in the software cost estimation area that has attracted the interest of software project managers is the selection of the best prediction method for estimating the cost of a project. Most of the prediction techniques estimate the cost from historical data. The selection of the best model is based on accuracy measures that are functions of the predictive error, whereas the significance of the differences can be evaluated through statistical procedures. However, statistical tests cannot be applied easily by non-experts while there are difficulties in the interpretation of their results. The purpose of this paper is to introduce the utilization of a visualization tool, the regression error characteristic curves in order to compare different prediction models easily, by a simple inspection of a graph. Moreover, these curves are adjusted to accuracy measures appeared in software cost estimation literature and the experimentation is based on two well-known datasets.