Charles X. Ling | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Charles X. Ling is active.

Explore More

Publication

Featured researches published by Charles X. Ling.

IEEE Transactions on Knowledge and Data Engineering | 2005

Using AUC and accuracy in evaluating learning algorithms

Jin Huang; Charles X. Ling

The area under the ROC (receiver operating characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. We establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.

soft computing | 2010

DE/BBO: a hybrid differential evolution with biogeography-based optimization for global numerical optimization

Wenyin Gong; Zhihua Cai; Charles X. Ling

Differential evolution (DE) is a fast and robust evolutionary algorithm for global optimization. It has been widely used in many areas. Biogeography-based optimization (BBO) is a new biogeography inspired algorithm. It mainly uses the biogeography-based migration operator to share the information among solutions. In this paper, we propose a hybrid DE with BBO, namely DE/BBO, for the global numerical optimization problem. DE/BBO combines the exploration of DE with the exploitation of BBO effectively, and hence it can generate the promising candidate solutions. To verify the performance of our proposed DE/BBO, 23 benchmark functions with a wide range of dimensions and diverse complexities are employed. Experimental results indicate that our approach is effective and efficient. Compared with other state-of-the-art DE approaches, DE/BBO performs better, or at least comparably, in terms of the quality of the final solutions and the convergence rate. In addition, the influence of the population size, dimensionality, different mutation schemes, and the self-adaptive control parameters of DE are also studied.

international conference on machine learning | 2004

Decision trees with minimal costs

Charles X. Ling; Qiang Yang; Jianning Wang; Shichao Zhang

We propose a simple, novel and yet effective method for building and testing decision trees that minimizes the sum of the misclassification and test costs. More specifically, we first put forward an original and simple splitting criterion for attribute selection in tree building. Our tree-building algorithm has many desirable properties for a cost-sensitive learning system that must account for both types of costs. Then, assuming that the test cases may have a large number of missing values, we design several intelligent test strategies that can suggest ways of obtaining the missing values at a cost in order to minimize the total cost. We experimentally compare these strategies and C4.5, and demonstrate that our new algorithms significantly outperform C4.5 and its variations. In addition, our algorithms complexity is similar to that of C4.5, and is much lower than that of previous work. Our work is useful for many diagnostic tasks which must factor in the misclassification and test costs for obtaining missing information.

systems man and cybernetics | 2011

Enhanced Differential Evolution With Adaptive Strategies for Numerical Optimization

Wenyin Gong; Zhihua Cai; Charles X. Ling; Hui Li

Differential evolution (DE) is a simple, yet efficient, evolutionary algorithm for global numerical optimization, which has been widely used in many areas. However, the choice of the best mutation strategy is difficult for a specific problem. To alleviate this drawback and enhance the performance of DE, in this paper, we present a family of improved DE that attempts to adaptively choose a more suitable strategy for a problem at hand. In addition, in our proposed strategy adaptation mechanism (SaM), different parameter adaptation methods of DE can be used for different strategies. In order to test the efficiency of our approach, we combine our proposed SaM with JADE, which is a recently proposed DE variant, for numerical optimization. Twenty widely used scalable benchmark problems are chosen from the literature as the test suit. Experimental results verify our expectation that the SaM is able to adaptively determine a more suitable strategy for a specific problem. Compared with other state-of-the-art DE variants, our approach performs better, or at least comparably, in terms of the quality of the final solutions and the convergence rate. Finally, we validate the powerful capability of our approach by solving two real-world optimization problems.

canadian conference on artificial intelligence | 2003

AUC: a better measure than accuracy in comparing learning algorithms

Charles X. Ling; Jin Huang; Harry Zhang

Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the classification, but they are completely ignored in the accuracy measure. This is often taken for granted because both training and testing sets only provide class labels. In this paper we establish rigourously that, even in this setting, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, provides a better measure than accuracy. Our result is quite significant for three reasons. First, we establish, for the first time, rigourous criteria for comparing evaluation measures for learning algorithms. Second, it suggests that AUC should replace accuracy when measuring and comparing classification systems. Third, our result also prompts us to reevaluate many well-established conclusions based on accuracy in machine learning. For example, it is well accepted in the machine learning community that, in terms of predictive accuracy, Naive Bayes and decision trees are very similar. Using AUC, however, we show experimentally that Naive Bayes is significantly better than the decision-tree learning algorithms.

international conference on data mining | 2003

Comparing naive Bayes, decision trees, and SVM with AUC and accuracy

Jin Huang; Jingjing Lu; Charles X. Ling

Predictive accuracy has often been used as the main and often only evaluation criterion for the predictive performance of classification or data mining algorithms. In recent years, the area under the ROC (receiver operating characteristics) curve, or simply AUC, has been proposed as an alternative single-number measure for evaluating performance of learning algorithms. We proved that AUC is, in general, a better measure (defined precisely) than accuracy. Many popular data mining algorithms should then be reevaluated in terms of AUC. For example, it is well accepted that Naive Bayes and decision trees are very similar in accuracy. How do they compare in AUC? Also, how does the recently developed SVM (support vector machine) compare to traditional learning algorithms in accuracy and AUC? We will answer these questions. Our conclusions will provide important guidelines in data mining applications on real-world datasets.

Applied Mathematics and Computation | 2010

A real-coded biogeography-based optimization with mutation☆

Wenyin Gong; Zhihua Cai; Charles X. Ling; Hui Li

Biogeography-based optimization (BBO) is a new biogeography inspired algorithm for global optimization. There are some open research questions that need to be addressed for BBO. In this paper, we extend the original BBO and present a real-coded BBO approach, referred to as RCBBO, for the global optimization problems in the continuous domain. Furthermore, in order to improve the diversity of the population and enhance the exploration ability of RCBBO, the mutation operator is integrated into RCBBO. Experiments have been conducted on 23 benchmark problems of a wide range of dimensions and diverse complexities. The results indicate the good performance of the proposed RCBBO method. Moreover, experimental results also show that the mutation operator can improve the performance of RCBBO effectively.

international conference on data mining | 2004

Test-cost sensitive naive Bayes classification

Xiaoyong Chai; Lin Deng; Qiang Yang; Charles X. Ling

Inductive learning techniques such as the naive Bayes and decision tree algorithms have been extended in the past to handle different types of costs mainly by distinguishing different costs of classification errors. However, it is an equally important issue to consider how to handle the test costs associated with querying the missing values in a test case. When the value of an attribute is missing in a test case, it may or may not be worthwhile to take the effort to obtain its missing value, depending on how much the value results in a potential gain in the classification accuracy. In this paper, we show how to obtain a test-cost sensitive naive Bayes classifier (csNB) by including a test strategy which determines how unknown attributes are selected to perform test on in order to minimize the sum of the mis-classification costs and test costs. We propose and evaluate several potential test strategies including one that allows several tests to be done at once. We empirically evaluate the csNB method, and show that it compares favorably with its decision tree counterpart.

international conference on machine learning | 2008

Discriminative parameter learning for Bayesian networks

Jiang Su; Harry Zhang; Charles X. Ling; Stan Matwin

Bayesian network classifiers have been widely used for classification problems. Given a fixed Bayesian network structure, parameters learning can take two different approaches: generative and discriminative learning. While generative parameter learning is more efficient, discriminative parameter learning is more effective. In this paper, we propose a simple, efficient, and effective discriminative parameter learning method, called Discriminative Frequency Estimate (DFE), which learns parameters by discriminatively computing frequencies from data. Empirical studies show that the DFE algorithm integrates the advantages of both generative and discriminative learning: it performs as well as the state-of-the-art discriminative parameter learning method ELR in accuracy, but is significantly more efficient.

Bioinformatics | 2004

Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry

Yan Fu; Qiang Yang; Rui-Xiang Sun; Dequan Li; Rong Zeng; Charles X. Ling; Wen Gao

MOTIVATION The correlation among fragment ions in a tandem mass spectrum is crucial in reducing stochastic mismatches for peptide identification by database searching. Until now, an efficient scoring algorithm that considers the correlative information in a tunable and comprehensive manner has been lacking. RESULTS This paper provides a promising approach to utilizing the correlative information for improving the peptide identification accuracy. The kernel trick, rooted in the statistical learning theory, is exploited to address this issue with low computational effort. The common scoring method, the tandem mass spectral dot product (SDP), is extended to the kernel SDP (KSDP). Experiments on a dataset reported previously demonstrate the effectiveness of the KSDP. The implementation on consecutive fragments shows a decrease of 10% in the error rate compared with the SDP. Our software tool, pFind, using a simple scoring function based on the KSDP, outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy. SUPPLEMENTARY INFORMATION http://www.jdl.ac.cn/user/yfu/pfind/index.html

Explore More