IEEE Transactions on Software Engineering | 2019

On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction

 
 
 
 
 
 

Abstract


Heterogeneous defect prediction (HDP) refers to predicting defect-proneness of software modules in a target project using heterogeneous metric data from other projects. Existing HDP methods mainly focus on predicting target instances with single source. In practice, there exist plenty of external projects. Multiple sources can generally provide more information than a single project. Therefore, it is meaningful to investigate whether the HDP performance can be improved by employing multiple sources. However, a precondition of conducting HDP is that the external sources are available. Due to privacy concerns, most companies are not willing to share their data. To facilitate data sharing, it is essential to study how to protect the privacy of data owners before they release their data. In this paper, we study the above two issues in HDP. Specifically, to utilize multiple sources effectively, we propose a multi-source selection based manifold discriminant alignment (MSMDA) approach. To protect the privacy of data owners, a sparse representation based double obfuscation algorithm is designed and applied to HDP. Through a case study of 28 projects, our results show that MSMDA can achieve better performance than a range of baseline methods. The improvement is 3.4-$15.3$15.3 percent in g-measure and 3.0-$19.1$19.1 percent in AUC.

Volume 45
Pages 391-411
DOI 10.1109/TSE.2017.2780222
Language English
Journal IEEE Transactions on Software Engineering

Full Text