2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) | 2021

A Study on Machine Learning Applied to Software Bug Priority Prediction

 
 
 
 

Abstract


Bugs are among the top problems faced by software developers. As the size and complexity of software projects increase so does the number of bugs and their complexity. Bug priority prediction helps software developers focus their efforts on the most critical bugs that affect the core functionality of a software. By automating the process of priority prediction, it is possible to reduce the time spent analyzing new bug reports. In this paper, we extract bug reports from the bug tracking software of six popular open-source projects Hadoop, HBase, HDFS, Mesos, Spark, and MapReduce and apply five machine learning classifiers Multinomial Naive Bayes, Decision Tree, Logistic Regression, Random Forest, AdaBoost to automatically predict bug priority using the title, description, and summary of the bug report. We use tf-idf to extract useful features from the bug reports and employ precision, recall, and F1-score for measuring the performance of the classifiers. A stratified 10-fold cross-validation technique is used for model evaluation and the results are averaged over all 10 folds. We find that machine learning applied to bug priority prediction provides excellent results and can be used to significantly reduce the time involved in the bug prioritization process. From our experiments, we observe that no single classifier consistently performs best on all priority levels and metrics across all datasets. However, trends from results show that Multinomial Naive Bayes gives well balanced performance and is also fast to train and test. Logistic Regression and AdaBoost also performed well and are potential alternatives.

Volume None
Pages 965-970
DOI 10.1109/Confluence51648.2021.9377083
Language English
Journal 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

Full Text