Huanjing Wang
Western Kentucky University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huanjing Wang.
Software - Practice and Experience | 2011
Kehan Gao; Taghi M. Khoshgoftaar; Huanjing Wang; Naeem Seliya
The selection of software metrics for building software quality prediction models is a search‐based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault‐proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real‐world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright
international conference on machine learning and applications | 2010
Huanjing Wang; Taghi M. Khoshgoftaar; Amri Napolitano
Feature selection has become the essential step in many data mining applications. Using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. We present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly-used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 13,600 classification models. Experimental results indicate that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.
information reuse and integration | 2009
Kehan Gao; Taghi M. Khoshgoftaar; Huanjing Wang
Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, Automatic Hybrid Search (AHS), Rough Sets (RS), Kolmogorov-Smirnov (KS) and Probabilistic Search (PS) and conducted the experiments by using them on a very large telecommunications software system. In order to evaluate their classification performance on the smaller subsets of attributes selected using different approaches, we built several classification models using five different classifiers. The empirical results demonstrated that by applying an attribution selection approach we can build classification models with an accuracy comparable to that built with a complete set of attributes. Moreover, the smaller subset of attributes has less than 15 percent of the complete set of attributes. Therefore, the metrics collection, model calibration, model validation, and model evaluation times of future software development efforts of similar systems can be significantly reduced. In addition, we demonstrated that our recently proposed attribute selection technique, KS, outperformed the other three attribute selection techniques.
granular computing | 2010
Huanjing Wang; Taghi M. Khoshgoftaar; Jason Van Hulse
Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The experiments demonstrate that the choice of a performance metric may significantly influence the results. In this study, we have found four distinct patterns when utilizing eight performance metrics to order 11 threshold-based feature selection techniques. Moreover, performances of the software quality models either improve or remain unchanged despite the removal of over 96% of the software metrics (attributes).
Neurocomputing | 2012
Huanjing Wang; Taghi M. Khoshgoftaar; Amri Napolitano
Software defect prediction models are used to identify program modules that are high-risk, or likely to have a high number of faults. These models are built using software metrics which are collected during the software development process. Various techniques and approaches have been created for improving fault predictions. One of these is feature (metric) selection. Choosing the most important features is important to improve the effectiveness of defect predictors. However, using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. In this paper, we present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 54,400 classification models using four well known classifiers. The main conclusion is that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.
bioinformatics and biomedicine | 2011
David J. Dittman; Taghi M. Khoshgoftaar; Randall Wald; Huanjing Wang
One major problem faced when analyzing DNA microarrays is their high dimensionality (large number of features). Therefore, feature selection is a necessary step when using these datasets. However, the addition or removal of instances can alter the subsets chosen by a feature selection technique. The ideal situation is to choose a feature selection technique that is robust (stable) to changes in the number of instances, with selected features changing little even when instances are added or removed. In this study we test the stability of nineteen feature selection techniques across twenty six datasets with varying levels of class imbalance. Our results show that the best choice of technique depends on the class balance of the datasets. The top performers are Deviance for balanced datasets, Signal to Noise for slightly imbalanced datasets, and AUC for imbalanced datasets. SVM-RFE was the least stable feature selection technique across the board, while other poor performers include Gain Ratio, Gini Index, Probability Ratio, and Power. We also found that enough changes to the dataset can make any feature selection technique unstable, and that using more features increases the stability of most feature selection techniques. Most intriguing was our finding that the more imbalanced a dataset is, the more stable the feature subsets built for that dataset will be. Overall, we conclude that stability is an important aspect of feature ranking which must be taken into account when planning a feature selection strategy or when adding or removing instances from a dataset.
international conference on data mining | 2009
Huanjing Wang; Taghi M. Khoshgoftaar; Kehan Gao; Naeem Seliya
A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of selecting a best-set of software attributes for software quality prediction. Moreover, it is intuitive to remove attributes that do not add to, or have an adverse effect on, the knowledge of the consequent model. Based on real-world software projects’ data, we present a large case study that compares wrapper-based feature ranking techniques (WRT) and our proposed hybrid feature selection (HFS) technique. The comparison is done using both threefold cross-validation (CV) and three-fold cross-validation with risk impact (CVR). It is shown that HFS is better than WRT, while CV is superior to CVR.
International Journal of Software Engineering and Knowledge Engineering | 2011
Huanjing Wang; Taghi M. Khoshgoftaar; Jason Van Hulse; Kehan Gao
Real-world software systems are becoming larger, more complex, and much more unpredictable. Software systems face many risks in their life cycles. Software practitioners strive to improve software quality by constructing defect prediction models using metric (feature) selection techniques. Finding faulty components in a software system can lead to a more reliable final system and reduce development and maintenance costs. This paper presents an empirical study of six commonly used filter-based software metric rankers and our proposed ensemble technique using rank ordering of the features (mean or median), applied to three large software projects using five commonly used learners. The classification accuracy was evaluated in terms of the AUC (Area Under the ROC (Receiver Operating Characteristic) Curve) performance metric. Results demonstrate that the ensemble technique performed better overall than any individual ranker and also possessed better robustness. The empirical study also shows that variations among rankers, learners and software projects significantly impacted the classification outcomes, and that the ensemble method can smooth out performance.
information reuse and integration | 2010
Huanjing Wang; Taghi M. Khoshgoftaar; Kehan Gao
One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.
International Journal on Artificial Intelligence Tools | 2013
Huanjing Wang; Taghi M. Khoshgoftaar; Qianhui Althea Liang
Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.