Guoxun Wang
University of Electronic Science and Technology of China
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guoxun Wang.
Information Sciences | 2014
Gang Kou; Yi Peng; Guoxun Wang
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. Since the evaluation of clustering algorithms normally involves multiple criteria, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper presents an MCDM-based approach to rank a selection of popular clustering algorithms in the domain of financial risk analysis. An experimental study is designed to validate the proposed approach using three MCDM methods, six clustering algorithms, and eleven cluster validity indices over three real-life credit risk and bankruptcy risk data sets. The results demonstrate the effectiveness of MCDM methods in evaluating clustering algorithms and indicate that the repeated-bisection method leads to good 2-way clustering solutions on the selected financial risk data sets.
International Journal of Information Technology and Decision Making | 2011
Yi Peng; Gang Kou; Guoxun Wang; Wenshuai Wu; Yong Shi
Classification algorithms that help to identify software defects or faults play a crucial role in software risk management. Experimental results have shown that ensemble of classifiers are often more accurate and robust to the effects of noisy data, and achieve lower average error rate than any of the constituent classifiers. However, inconsistencies exist in different studies and the performances of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. The goal of this paper is to assess the quality of ensemble methods in software defect prediction with the analytic hierarchy process (AHP), which is a multicriteria decision-making approach that prioritizes decision alternatives based on pairwise comparisons. Through the application of the AHP, this study compares experimentally the performance of several popular ensemble methods using 13 different performance metrics over 10 public-domain software defect datasets from the NASA Metrics Data Program (MDP) repository. The results indicate that ensemble methods can improve the classification results of software defect prediction in general and AdaBoost gives the best results. In addition, tree and rule based classifiers perform better in software defect prediction than other types of classifiers included in the experiment. In terms of single classifier, K-nearest-neighbor, C4.5, and Naive Bayes tree ranked higher than other classifiers.
International Journal of Information Technology and Decision Making | 2009
Yi Peng; Gang Kou; Guoxun Wang; Honggang Wang; Franz I. S. Ko
Software development involves plenty of risks, and errors exist in software modules represent a major kind of risk. Software defect prediction techniques and tools that identify software errors play a crucial role in software risk management. Among software defect prediction techniques, classification is a commonly used approach. Various types of classifiers have been applied to software defect prediction in recent years. How to select an adequate classifier (or set of classifiers) to identify error prone software modules is an important task for software development organizations. There are many different measures for classifiers and each measure is intended for assessing different aspect of a classifier. This paper developed a performance metric that combines various measures to evaluate the quality of classifiers for software defect prediction. The performance metric is analyzed experimentally using 13 classifiers on 11 public domain software defect datasets. The results of the experiment indicate that support vector machines (SVM), C4.5 algorithm, and K-nearest-neighbor algorithm ranked the top three classifiers.
Information Sciences | 2012
Yi Peng; Guoxun Wang; Honggang Wang
A variety of classification algorithms for software defect detection have been developed over the years. How to select an appropriate classifier for a given task is an important issue in Data mining and knowledge discovery (DMKD). Many studies have compared different types of classification algorithms and the performances of these algorithms may vary using different performance measures and under different circumstances. Since the algorithm selection task needs to examine several criteria, such as accuracy, computational time, and misclassification rate, it can be modeled as a multiple criteria decision making (MCDM) problem. The goal of this paper is to use a set of MCDM methods to rank classification algorithms, with empirical results based on the software defect detection datasets. Since the preferences of the decision maker (DM) play an important role in algorithm evaluation and selection, this paper involved the DM during the ranking procedure by assigning user weights to the performance measures. Four MCDM methods are examined using 38 classification algorithms and 13 evaluation criteria over 10 public-domain software defect datasets. The results indicate that the boosting of CART and the boosting of C4.5 decision tree are ranked as the most appropriate algorithms for software defect datasets. Though the MCDM methods provide some conflicting results for the selected software defect datasets, they agree on most top-ranked classification algorithms.
international conference on computational science | 2009
Guangli Nie; Guoxun Wang; Peng Zhang; Yingjie Tian; Yong Shi
In this paper, we propose a framework of the whole process of churn prediction of credit card holder. In order to make the knowledge extracted from data mining more executable, we take the execution of the model into account during the whole process from variable designing to model understanding. Using the Logistic regression, we build a model based on the data of more than 5000 credit card holders. The tests of model perform very well.
international conference on information sciences and interaction sciences | 2010
Yu Shi; Gang Kou; Youyuan Li; Guoxun Wang; Yi Peng; Yong Shi
In this study, we proposed a Fuzzy multi-criteria decision-making hybrid approach (FMCDM) which integrates the Fuzzy Analytic Hierarchy Process (FAHP) and the Fuzzy Technique for Order Preference by Similarity to Ideal Solution (FTOPSIS) to evaluate the severity of the damage of Typhoon in various provinces in China. In order to match the AHP method in the 1–9 criteria, the statistics data of typhoon disaster losses in 2008 is normalized by fuzzy number and then processed by the proposed hybrid method of AHP and TOPSIS. The experiment results show that the proposed method is effective and can provide a comprehensive assessment to the damage level of Typhoon.
international conference on information sciences and interaction sciences | 2010
Gang Kou; Chunwei Lou; Guoxun Wang; Yi Peng; Yu Tang; Shiming Li
One of the biggest challenge in emergency management is how to deal with the incomplete, contradict and fuzzy information. This paper reviews related work and develop a framework for heterogeneous information integration in emergency management. A high-level data integration module in which heterogeneous data sources are integrated and presented in a uniform format is also described. Furthermore, we discussed the data preprocessing, privacy protection, data integration module and implementation for emergency management in detail. This information integration framework for emergency management will help the decision maker to find the satisfactory solutions, make the right decisions, and takes appropriate responses in a timely and efficient manner.
computer science and information engineering | 2009
Guoxun Wang; Guangli Nie; Peng Zhang; Yong Shi
Market segmentation is one of the most important areas of knowledge-based marketing. When it comes to personal financial services in retail banks, it is really a challenging task as data bases are large and multidimensional. The conventional ways in customer segmentation are knowledge based and often get bias results. On the contrary, data mining can deal with mass of data and never overlook any important phenomena. In this paper, we choose the clustering ensemble method to do customer segmentation due to labeled data sets are not available. Through the experiments and tests in the real personal financial business, we can make a conclusion that our models reflect the true characteristics of various types of customers and can be used to find the investment orientations of customers.
multiple criteria decision making | 2009
Guoxun Wang; Fang Li; Peng Zhang; Yingjie Tian; Yong Shi
The personal financial market segmentation plays an important role in retail banking. It is widely admitted that there are a lot of limitations of conventional ways in customer segmentation, which are knowledge based and often get bias results. In contrast, data mining can deal with mass of data and never miss any useful knowledge. Due to the mass storage volume of unlabeled transaction data, in this paper, we propose a clustering ensemble method based on majority voting mechanism and two alternative manners to further enhance the performance of customer segmentation in real banking business. Through the experiments and examinations in real business environment, we can come to a conclusion that our model reflect the true characteristics of various types of customers and can be used to find the investment preferences of customers.
Omega-international Journal of Management Science | 2011
Yi Peng; Gang Kou; Guoxun Wang; Yong Shi