Xiao-Bo Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiao-Bo Jin is active.

Explore More

Publication

Featured researches published by Xiao-Bo Jin.

Pattern Recognition | 2010

Regularized margin-based conditional log-likelihood loss for prototype learning

Xiao-Bo Jin; Cheng-Lin Liu; Xinwen Hou

The classification performance of nearest prototype classifiers largely relies on the prototype learning algorithm. The minimum classification error (MCE) method and the soft nearest prototype classifier (SNPC) method are two important algorithms using misclassification loss. This paper proposes a new prototype learning algorithm based on the conditional log-likelihood loss (CLL), which is based on the discriminative model called log-likelihood of margin (LOGM). A regularization term is added to avoid over-fitting in training as well as to maximize the hypothesis margin. The CLL in the LOGM algorithm is a convex function of margin, and so, shows better convergence than the MCE. In addition, we show the effects of distance metric learning with both prototype-dependent weighting and prototype-independent weighting. Our empirical study on the benchmark datasets demonstrates that the LOGM algorithm yields higher classification accuracies than the MCE, generalized learning vector quantization (GLVQ), soft nearest prototype classifier (SNPC) and the robust soft learning vector quantization (RSLVQ), and moreover, the LOGM with prototype-dependent weighting achieves comparable accuracies to the support vector machine (SVM) classifier.

fuzzy systems and knowledge discovery | 2007

Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification

Guanggang Geng; Chun-Heng Wang; Qiudan Li; Lei Xu; Xiao-Bo Jin

Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the Web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable Websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.

international conference on pattern recognition | 2010

Multi-class AdaBoost with Hypothesis Margin

Xiao-Bo Jin; Xinwen Hou; Cheng-Lin Liu

Most AdaBoost algorithms for multi-class problems have to decompose the multi-class classification into multiple binary problems, like the Adaboost.MH and the LogitBoost. This paper proposes a new multi-class AdaBoost algorithm based on hypothesis margin, called AdaBoost.HM, which directly combines multi-class weak classifiers. The hypothesis margin maximizes the output about the positive class meanwhile minimizes the maximal outputs about the negative classes. We discuss the upper bound of the training error about AdaBoost.HM and a previous multi-class learning algorithm AdaBoost.M1. Our experiments using feed forward neural networks as weak learners show that the proposed AdaBoost.HM yields higher classification accuracies than the AdaBoost.M1 and the AdaBoost.MH, and meanwhile, AdaBoost.HM is computationally efficient in training.

international conference on pattern recognition | 2008

Prototype learning with margin-based conditional log-likelihood loss

Xiao-Bo Jin; Cheng-Lin Liu; Xinwen Hou

The classification performance of nearest prototype classifiers largely relies on the prototype learning algorithms, such as the learning vector quantization (LVQ) and the minimum classification error (MCE). This paper proposes a new prototype learning algorithm based on the minimization of a conditional log-likelihood loss (CLL), called log-likelihood of margin (LOGM). A regularization term is added to avoid over-fitting in training. The CLL loss in LOGM is a convex function of margin, and so, gives better convergence than the MCE algorithm. Our empirical study on a large suite of benchmark datasets demonstrates that the proposed algorithm yields higher accuracies than the MCE, the generalized LVQ (GLVQ), and the soft nearest prototype classifier (SNPC).

fuzzy systems and knowledge discovery | 2012

Multi-partite ranking with multi-class AdaBoost algorithm

Xiao-Bo Jin; Junwei Yu; Dexian Zhang; Guanggang Geng

The algorithms on learning to rank can traditionally be categorized as three classes including point-wise, pair-wise and list-wise. In our work, we focus on the regression-based method for the multi-partite ranking problems due to the efficiency of the point-wise methods. We proposed two ranking algorithms with the real AdaBoost and the discrete AdaBoost, which compute the expectation of the ratings with the estimation of the pseudo posterior probabilities. We found that it can be explained in the framework of the regression with the squared loss. It is more easily implemented than the previous McRank method since the algorithm adopts the decision stump as the weak leaner instead of the regression tree. In the fifteen benchmark datasets, our methods achieve better performance than the pair-wise method RankBoost under the C-index, NDCG and variant of NDCG measures. It has the lower training time complexity than RankBoost but the identical test time complexity.

fuzzy systems and knowledge discovery | 2012

Multi-label classification for Oil Authentication

Quan-gong Huo; Xiao-Bo Jin; Hongmei Zhang

Oil Authentication influences the life of the human being substantially. In tradition, NIR (near infrared ray) is followed by the single-label learning or the feature transformation to distinguish the pure oil and the mixed oil. In our work, we adopt the multi-label AdaBoost.RMH algorithm to proceed the chromatographic images of edible oil from high performance liquid chromatography. Furthermore, we rectify the predict results of the multi-label AdaBoost.RMH with the binary AdaBoost.RMH algorithm. Finally, the detect rate and the accuracy for the multi-label classification are proposed to measure the ability of the algorithm on recognizing the pureness property and the composite of the oil, respectively. The experiments from the dataset on 9 kinds of edible oil and their mixture shows our algorithm (AdaBoost.REC) can achieve the remarkable improvements than AdaBoost.RMH.

fuzzy systems and knowledge discovery | 2011

Build decision tree on support vector machine

Dexian Zhang; Xiao-Bo Jin

C4.5 is a popular classification method which can give the explainable and intuitional classification rules. But it is prone to overfitting due to the data noise or the distribution of the instances. In this paper, we proposed a new decision tree method with the support vector machine (SVM-DTR), which make the surface of the decision tree to discriminate the instances from the different categories as far as possible. SVMis used to measure the importance of the attribute on the fact that the cosine of the angle between the attribute axis and the normal of the decision surface can quantize its significance. Similar as the C4.5, each time we choose the most important attribute as the root of the sub-tree. We analyze the influence of the kernel width to the magnitude of the gradient and obtain the empirical settings about the kernel width from the experiments. The comparisons between the SVM-DTR and the C4.5 on 5 datasets from UCI machine learning repository show that SVM-DTR achieve the better performance than C4.5.

international conference on wavelet analysis and pattern recognition | 2007

A hybrid generative-discriminative learning algorithm for Bayesian network structure

Xiao-Bo Jin; Xinwen Hou; Cheng-Lin Liu

The discriminative learning of Bayesian networks benefits the classification accuracy as compared to generative learning. Previous approaches mostly learn either the structure or the parameters in a discriminative manner based on the scoring+ search paradigm. Many works have focused on structure learning by optimizing a discriminative scoring function but the resulted structure is still generative in the sense that the class variable is not conditioned on attribute variables. On the other hand, searching Markov Blanket in a constrained space can generate a hybrid generative-discriminative structure. In this paper, we propose a new hybrid generative-discriminative (HGD) algorithm for learning Bayesian network structure. The algorithm searches the neighboring structures by optimizing a cross-validated classification rate (CR) criterion to give a really discriminative structure. We select the initial structure and design neighborhood operators appropriately such that the learning procedure is computationally feasible. Our empirical study on a large suite of bench-mark datasets shows that the proposed HGD+CR algorithm yields better classification results than BN classifiers with only discriminative scores.

arXiv: Information Retrieval | 2013