Liangxiao Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liangxiao Jiang is active.

Explore More

Publication

Featured researches published by Liangxiao Jiang.

Engineering Applications of Artificial Intelligence | 2016

Deep feature weighting for naive Bayes and its application to text classification

Liangxiao Jiang; Chaoqun Li; Shasha Wang; Lungan Zhang

Naive Bayes (NB) continues to be one of the top 10 data mining algorithms due to its simplicity, efficiency and efficacy. Of numerous proposals to improve the accuracy of naive Bayes by weakening its feature independence assumption, the feature weighting approach has received less attention from researchers. Moreover, to our knowledge, all of the existing feature weighting approaches only incorporate the learned feature weights into the classification of formula of naive Bayes and do not incorporate the learned feature weights into its conditional probability estimates at all. In this paper, we propose a simple, efficient, and effective feature weighting approach, called deep feature weighting (DFW), which estimates the conditional probabilities of naive Bayes by deeply computing feature weighted frequencies from training data. Empirical studies on a collection of 36 benchmark datasets from the UCI repository show that naive Bayes with deep feature weighting rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. Besides, we apply the proposed deep feature weighting to some state-of-the-art naive Bayes text classifiers and have achieved remarkable improvements.

Knowledge and Information Systems | 2015

Adapting naive Bayes tree for text classification

Shasha Wang; Liangxiao Jiang; Chaoqun Li

Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a naive Bayes classifier on each leaf node of the built decision tree and has demonstrated remarkable classification performance. When comes to text classification tasks, multinomial naive Bayes (MNB) has been a dominant modeling approach after the multi-variate Bernoulli model. Inspired by the success of NBTree, we propose a new algorithm called multinomial naive Bayes tree (MNBTree) by deploying a multinomial naive Bayes text classifier on each leaf node of the built decision tree. Different from NBTree, MNBTree builds a binary tree, in which the split attributes’ values are just divided into zero and nonzero. At the same time, MNBTree uses the information gain measure instead of the classification accuracy measure to build the tree for reducing the time consumption. To further scale up the classification performance of MNBTree, we propose its multiclass learning version called multiclass multinomial naive Bayes tree (MMNBTree) by applying the multiclass technique to MNBTree. The experimental results on a large number of widely used text classification benchmark datasets validate the effectiveness of our proposed algorithms: MNBTree and MMNBTree.

Knowledge and Information Systems | 2009

Learning decision tree for ranking

Liangxiao Jiang; Chaoqun Li; Zhihua Cai

Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.

Information Sciences | 2016

Structure extended multinomial naive Bayes

Liangxiao Jiang; Shasha Wang; Chaoqun Li; Lungan Zhang

Multinomial naive Bayes (MNB) is widely used for text classification.We summarize all categories of the existing improved approaches to MNB.We propose structure extended multinomial naive Bayes (SEMNB).SEMNB averages all of the weighted one-dependence multinomial estimators.The experimental results validate its effectiveness. Multinomial naive Bayes (MNB) assumes that all attributes (i.e., features) are independent of each other given the context of the class, and it ignores all dependencies among attributes. However, in many real-world applications, the attribute independence assumption required by MNB is often violated and thus harms its performance. To weaken this assumption, one of the most direct ways is to extend its structure to represent explicitly attribute dependencies by adding arcs between attributes. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from high-dimensional text data is almost impossible. The main reason is that learning the optimal structure of a Bayesian network from high-dimensional text data is extremely time and space consuming. Thus, it would be desirable if a multinomial Bayesian network model can avoid structure learning and be able to represent attribute dependencies to some extent. In this paper, we propose a novel model called structure extended multinomial naive Bayes (SEMNB). SEMNB alleviates the attribute independence assumption by averaging all of the weighted one-dependence multinomial estimators. To learn SEMNB, we propose a simple but effective learning algorithm without structure searching. The experimental results on a large suite of benchmark text datasets show that SEMNB significantly outperforms MNB and is even markedly better than other three state-of-the-art improved algorithms including TDM, DWMNB, and Rw, cMNB.

Knowledge and Information Systems | 2017

Toward value difference metric with attribute weighting

Chaoqun Li; Liangxiao Jiang; Hongwei Li; Jia Wu; Peng Zhang

In distance metric learning, recent work has shown that value difference metric (VDM) with a strong attribute independence assumption outperforms other existing distance metrics. However, an open question is whether VDM with a less restrictive assumption can perform even better. Many approaches have been proposed to improve VDM by weakening the assumption. In this paper, we make a comprehensive survey on the existing improved approaches and then propose a new approach to improve VDM by attribute weighting. We name the proposed new distance function as attribute-weighted value difference metric (AWVDM). Moreover, we propose a modified attribute-weighted value difference metric (MAWVDM) by incorporating the learned attribute weights into the conditional probability estimates of AWVDM. AWVDM and MAWVDM significantly outperform VDM and inherit the computational simplicity of VDM simultaneously. Experimental results on a large number of UCI data sets validate the performance of AWVDM and MAWVDM.

Frontiers of Computer Science in China | 2014

Naive Bayes for value difference metric

Chaoqun Li; Liangxiao Jiang; Hongwei Li

The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instanceweighting technique to improveVDM. An instance weighted value difference metric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for training instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the difference between the estimated conditional probability ^P(c|x) by NB and the true conditional probability P(c|x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complexity of the process of finding weights, and simultaneously improving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.

Pattern Recognition Letters | 2016

Beyond accuracy

Ganggang Kong; Liangxiao Jiang; Chaoqun Li

Cost-sensitive learning is often desirable in many real-world applications.We review the related work on naive Bayes and test-cost sensitive learning.We propose a new test-cost sensitive naive Bayes algorithm.The proposed algorithm selects an optimal attribute subset with minimal test cost.Experimental results on a large number of datasets validate its effectiveness. Some existing test-cost sensitive learning algorithms are about balancing act of the misclassification cost and the total test cost, and the others focus on the balance between the classification accuracy and the total test cost. By far, however, few works reduce the total test cost, yet at the same time maintain the high classification accuracy. In order to achieve this goal, this paper modifies the backward greedy search strategy employed in selective Bayesian classifiers (SBC), which is a state-of-the-art improved naive Bayes algorithm pursuing the high classification accuracy but ignoring the total test cost. We call the resulting model test-cost sensitive naive Bayes (TCSNB). TCSNB conducts a modified backward greedy search strategy to select an optimal attribute subset with the minimal total test cost, yet at the same time maintains the high classification accuracy that characterizes SBC. Extensive empirical study validates its effectiveness and efficiency.

Pattern Recognition Letters | 2014

Local value difference metric

Chaoqun Li; Liangxiao Jiang; Hongwei Li

Value difference metric (VDM) is one of the widely used distance functions.We propose local value difference metric (LVDM).LVDM uses a modified decision tree algorithm to find the neighborhood of the test instance.The experimental results on 36 UCI datasets validate its effectiveness. Value difference metric (VDM) is one of the widely used distance functions designed to work with nominal attributes. Research has indicated that the definition of VDM follows naturally from a simple probabilistic model called a naive Bayes (NB). NB assumes that all the attributes are independent given the class. To further improve the performance of NB, several techniques have been proposed. Among these, an effective technique is local learning. Because VDM has a close relationship with NB, in this paper, we propose a local learning method for VDM. The improved distance function is called local value difference metric (LVDM). When LVDM computes the distance between a test instance and each training instance, the conditional probabilities in VDM are estimated by counting from the neighborhood of the test instance only instead of from all the training data. A modified decision tree algorithm is proposed to determine the neighborhood of the test instance. The experimental results on 43 datasets downloaded from the University of California at Irvine (UCI) show that the proposed LVDM significantly outperforms VDM in terms of the class-probability estimation performance of distance-based learning algorithms.

international symposium on advances in computation and intelligence | 2008

A Combined Classification Algorithm Based on C4.5 and NB

Liangxiao Jiang; Chaoqun Li; Jia Wu; Jian Zhu

When our learning task is to build a model with accurate classification, C4.5 and NB are two very important algorithms for achieving this task because of their simplicity and high performance. In this paper, we present a combined classification algorithm based on C4.5 and NB, simply C4.5-NB. In C4.5-NB, the class probability estimates of C4.5 and NB are weighted according to their classification accuracy on the training data. We experimentally tested C4.5-NB in Weka system using the whole 36 UCI data sets selected by Weka, and compared it with C4.5 and NB. The experimental results show that C4.5-NB significantly outperforms C4.5 and NB in terms of classification accuracy. Besides, we also observe the ranking performance of C4.5-NB in terms of AUC (the area under the Receiver Operating Characteristics curve). Fortunately, C4.5-NB also significantly outperforms C4.5 and NB.

international conference on artificial neural networks | 2016

C4.5 or Naive Bayes: A Discriminative Model Selection Approach

Lungan Zhang; Liangxiao Jiang; Chaoqun Li

C4.5 and naive Bayes (NB) are two of the top 10 data mining algorithms thanks to their simplicity, effectiveness, and efficiency. It is well known that NB performs very well on some domains, and poorly on others that involve correlated features. C4.5, on the other hand, typically works better than NB on such domains. To integrate their advantages and avoid their disadvantages, many approaches, such as model insertion and model combination, are proposed. The model insertion approach such as NBTree inserts NB into each leaf of the built decision tree. The model combination approach such as C4.5-NB builds C4.5 and NB on a training dataset independently and then combines their prediction results for an unseen instance. In this paper, we focus on a new view and propose a discriminative model selection approach. For detail, at the training time, C4.5 and NB are built on a training dataset independently, and the most reliable one is recorded for each training instance. At the test time, for each test instance, we firstly find its nearest neighbor and then choose the most reliable model for its nearest neighbor to predict its class label. We simply denote the proposed algorithm as C4.5(Vert )NB. C4.5(Vert )NB retains the interpretability of C4.5 and NB, but significantly outperforms C4.5, NB, NBTree, and C4.5-NB.

Explore More