Goran Mauša | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Goran Mauša is active.

Explore More

Publication

Featured researches published by Goran Mauša.

international conference on software, telecommunications and computer networks | 2014

Software defect prediction with Bug-Code analyzer - A data collection tool demo

Goran Mauša; Tihana Galinac Grbac; Bojana Dalbelo Bašić

Empirical software engineering research community aims to accumulate knowledge in software engineering community based on the empirical studies on datasets obtained from the real software projects. Limiting factor to building the theory over thus accumulated knowledge is often related to dataset bias. One solution to this problem is developing a systematic data collection procedure through standard guidelines that would be available to open community and thus enable reducing data collection bias. In this paper we present a tool demonstration that implements a systematic data collection procedure for software defect prediction datasets from the open source bug tracking and the source code management repositories. Main challenging issue that the tool addresses is linking the information related to the same entity (e.g. class file) from these two sources. The tool implements interfaces to bug and source code repositories and even other tools for calculating the software metrics. Finally, it offers the user to create software defect prediction datasets even if he is unaware of all the details behind this complex task.

Computer Science and Information Systems | 2016

A systematic data collection procedure for software defect prediction

Goran Mauša; Tihana Galinac Grbac; Bojana Dalbelo Bašić

Software defect prediction research relies on data that must be collected from otherwise separate repositories. To achieve greater generalization of the results, standardized protocols for data collection and validation are necessary. This paper presents an exhaustive survey of techniques and approaches used in the data collection process. It identifies some of the issues that must be addressed to minimize dataset bias and also provides a number of measures that can help researchers to compare their data collection approaches and evaluate their data quality. Moreover, we present a data collection procedure that uses a bug-code linking technique based on regular expression. The detailed comparison and root cause analysis of inconsistencies with a number of popular data collection approaches and their publicly available datasets, reveals that our procedure achieves the most favorable results. Finally, we implement our data collection procedure in a data collection tool we name the Bug-Code (BuCo) Analyzer.

international convention on information and communication technology electronics and microelectronics | 2015

Data collection for Software Defect Prediction - An exploratory case study of open source software projects

Goran Mauša; Tihana Galinac Grbac; Bojana Dalbelo Bašić

Software Defect Prediction (SDP) empirical studies are highly biased with the quality of data and widely suffer from limited generalizations. The main reasons are the lack of data and its systematic data collection procedures. Our research aims at producing the first systematically defined data collection procedure for SDP datasets that are obtained by linking separate development repositories. This paper is the first step to achieving that objective, performing an exploratory study. We review the existing literature on approaches and tools used in the collection of SDP datasets, derive a detailed collection procedure and test it in this exploratory study. We quantify the bias that may be caused by the issues we identified and we review 35 tools for software product metrics collection. The most critical issues are many-to-many relation between bug-file links, duplicated bug-file links and the issue of untraceable bugs. Our research provides more detailed, experience based data collection procedure, crucial for further development of SDP body of knowledge. Furthermore, our findings enabled us to develop the automatic data collection tool.

conference on computer as a tool | 2013

Hill Climbing and simulated annealing in large scale next release problem

Goran Mauša; Tihana Galinac Grbac; Bojana Dalbelo Bašić; Mario-Osvin Pavcevic

Next release problem is a software engineering problem, lately often solved using heuristic algorithms. It deals with selecting a subset of requirements that should appear in next release of a software product. The problem lies in satisfying various parts interested in project development with acceptable costs. This paper compares two rather simple, but often used and efficient heuristic algorithms: Hill Climbing and Simulated Annealing. The aim of this paper was to compare the performance of these algorithms and their modifications on a large scale problem. We investigated the differences between four variations of Hill Climbing and two variations of Simulated Annealing, while Random Search was used to verify the benefit of using a heuristic algorithm. The evaluation was performed in terms of finding the best solution for a given budget and in calculating the proportion of non-dominated solutions that form the joint Pareto-optimal front. Our research was done on publicly available realistic datasets that were obtained mining the bug repositories. The results indicate Simulated Annealing as the more successful algorithm but point out that Simulated Annealing together with Hill Climbing provides a more thorough insight into the problem search space.

model and data engineering | 2017

The Stability of Threshold Values for Software Metrics in Software Defect Prediction

Goran Mauša; Tihana Galinac Grbac

Software metrics measure the complexity and quality in many empirical case studies. Recent studies have shown that threshold values can be detected for some metrics and used to predict defect-prone system modules. The goal of this paper is to empirically validate the stability of threshold values. Our aim is to analyze a wider set of software metrics than it has been previously reported and to perform the analysis in the context of different levels of data imbalance. We replicate the case study of deriving thresholds for software metrics using a statistical model based on logistic regression. Furthermore, we analyze threshold stability in the context of varying level of data imbalance. The methodology is validated using a great number of subsequent releases of open source projects. We revealed that threshold values of some metrics could be used to effectively predict defect-prone modules. Moreover, threshold values of some metrics may be influenced by the level of data imbalance. The results of this case study give a valuable insight into the importance of software metrics and the presented methodology may also be used by software quality assurance practitioners.

european conference on software architecture | 2016

On the distribution of software faults in evolution of complex systems

Tihana Galinac Grbac; Goran Mauša

Complex software systems and systems of systems have become essential in the modern human society, making their reliability one of the crucial problems in software engineering. As such systems are developed as a sequence of releases, it is important to understand the reliability behavior during their evolution. There are many empirical principles regarding the distribution of faults within system structure. All these principles are implied by the underlying probability distribution of faults. The aim of this paper is to find the probability distribution that best fits the empirical fault data from 21 versions of two evolutionary developed open source systems, and study how this distribution changes during system evolution.

international convention on information and communication technology, electronics and microelectronics | 2012