Is this you? Create Your Porfile

Yanshan Xiao

Guangdong University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanshan Xiao is active.

Explore More

Publication

Featured researches published by Yanshan Xiao.

Knowledge and Information Systems | 2013

SVDD-based outlier detection on uncertain data

Bo Liu; Yanshan Xiao; Longbing Cao; Zhifeng Hao; Feiqi Deng

Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classifier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is generated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global distinctive classifier for outlier detection. In this phase, the contribution of the examples with the least confidence score on the construction of the decision boundary has been reduced. The experiments show that the proposed approach outperforms state-of-art outlier detection techniques.

IEEE Transactions on Knowledge and Data Engineering | 2014

An Efficient Approach for Outlier Detection with Imperfect Data Labels

Bo Liu; Yanshan Xiao; Philip S. Yu; Zhifeng Hao; Longbing Cao

The task of outlier detection is to identify data objects that are markedly different from or inconsistent with the normal set of data. Most existing solutions typically build a model using the normal data and identify outliers that do not fit the represented model very well. However, in addition to normal data, there also exist limited negative examples or outliers in many applications, and data may be corrupted such that the outlier detection data is imperfectly labeled. These make outlier detection far more difficult than the traditional ones. This paper presents a novel outlier detection approach to address data with imperfect labels and incorporate limited abnormal examples into learning. To deal with data with imperfect labels, we introduce likelihood values for each input data which denote the degree of membership of an example toward the normal and abnormal classes respectively. Our proposed approach works in two steps. In the first step, we generate a pseudo training dataset by computing likelihood values of each example based on its local behavior. We present kernel \(k\) -means clustering method and kernel LOF-based method to compute the likelihood values. In the second step, we incorporate the generated likelihood values and limited abnormal examples into SVDD-based learning framework to build a more accurate classifier for global outlier detection. By integrating local and global outlier detection, our proposed method explicitly handles data with imperfect labels and enhances the performance of outlier detection. Extensive experiments on real life datasets have demonstrated that our proposed approaches can achieve a better tradeoff between detection rate and false alarm rate as compared to state-of-the-art outlier detection approaches.

international conference on data mining | 2009

Multi-sphere Support Vector Data Description for Outliers Detection on Multi-distribution Data

Yanshan Xiao; Bo Liu; Longbing Cao; Xindong Wu; Chengqi Zhang; Zhifeng Hao; Fengzhao Yang; Jie Cao

SVDD has been proved a powerful tool for outlier detection. However, in detecting outliers on multi-distribution data, namely there are distinctive distributions in the data, it is very challenging for SVDD to generate a hyper-sphere for distinguishing outliers from normal data. Even if such a hyper-sphere can be identified, its performance is usually not good enough. This paper proposes an multi-sphere SVDD approach, named MS-SVDD, for outlier detection on multi-distribution data. First, an adaptive sphere detection method is proposed to detect data distributions in the dataset. The data is partitioned in terms of the identified data distributions, and the corresponding SVDD classifiers are constructed separately. Substantial experiments on both artificial and real-world datasets have demonstrated that the proposed approach outperforms original SVDD.

IEEE Transactions on Knowledge and Data Engineering | 2014

Uncertain One-Class Learning and Concept Summarization Learning on Uncertain Data Streams

Bo Liu; Yanshan Xiao; Philip S. Yu; Longbing Cao; Yun Zhang; Zhifeng Hao

This paper presents a novel framework to uncertain one-class learning and concept summarization learning on uncertain data streams. Our proposed framework consists of two parts. First, we put forward uncertain one-class learning to cope with data of uncertainty. We first propose a local kernel-density-based method to generate a bound score for each instance, which refines the location of the corresponding instance, and then construct an uncertain one-class classifier (UOCC) by incorporating the generated bound score into a one-class SVM-based learning phase. Second, we propose a support vectors (SVs)-based clustering technique to summarize the concept of the user from the history chunks by representing the chunk data using support vectors of the uncertain one-class classifier developed on each chunk, and then extend k-mean clustering method to cluster history chunks into clusters so that we can summarize concept from the history chunks. Our proposed framework explicitly addresses the problem of one-class learning and concept summarization learning on uncertain one-class data streams. Extensive experiments on uncertain data streams demonstrate that our proposed uncertain one-class learning method performs better than others, and our concept summarization method can summarize the evolving interests of the user from the history chunks.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

A Similarity-Based Classification Framework for Multiple-Instance Learning

Yanshan Xiao; Bo Liu; Zhifeng Hao; Longbing Cao

Multiple-instance learning (MIL) is a generalization of supervised learning that attempts to learn useful information from bags of instances. In MIL, the true labels of instances in positive bags are not available for training. This leads to a critical challenge, namely, handling the instances of which the labels are ambiguous (ambiguous instances). To deal with these ambiguous instances, we propose a novel MIL approach, called similarity-based multiple-instance learning (SMILE). Instead of eliminating a number of ambiguous instances in positive bags from training the classifier, as done in some previous MIL works, SMILE explicitly deals with the ambiguous instances by considering their similarity to the positive class and the negative class. Specifically, a subset of instances is selected from positive bags as the positive candidates and the remaining ambiguous instances are associated with two similarity weights, representing the similarity to the positive class and the negative class, respectively. The ambiguous instances, together with their similarity weights, are thereafter incorporated into the learning phase to build an extended SVM-based predictive classifier. A heuristic framework is employed to update the positive candidates and the similarity weights for refining the classification boundary. Experiments on real-world datasets show that SMILE demonstrates highly competitive classification accuracy and shows less sensitivity to labeling noise than the existing MIL methods.

international joint conference on artificial intelligence | 2011

Similarity-based approach for positive and unlabelled learning

Yanshan Xiao; Bo Liu; Jie Yin; Longbing Cao; Chengqi Zhang; Zhifeng Hao

Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively. The local similarity-based and global similarity-based mechanisms are proposed to generate the similarity weights. The ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on real-world datasets have shown that SPUL outperforms state-of-the-art PU learning methods.

Applied Intelligence | 2014

A K-Farthest-Neighbor-based approach for support vector data description

Yanshan Xiao; Bo Liu; Zhifeng Hao; Longbing Cao

Support vector data description (SVDD) is a well-known technique for one-class classification problems. However, it incurs high time complexity in handling large-scale datasets. In this paper, we propose a novel approach, named K-Farthest-Neighbor-based Concept Boundary Detection (KFN-CBD), to improve the training efficiency of SVDD. KFN-CBD aims at identifying the examples lying close to the boundary of the target class, and these examples, instead of the entire dataset, are then used to learn the classifier. Extensive experiments have shown that KFN-CBD obtains substantial speedup compared to standard SVDD, and meanwhile maintains comparable accuracy as the entire dataset used.

international conference on data mining | 2010

Exploiting Local Data Uncertainty to Boost Global Outlier Detection

Bo Liu; Jie Yin; Yanshan Xiao; Longbing Cao; Philip S. Yu

This paper presents a novel hybrid approach to outlier detection by incorporating local data uncertainty into the construction of a global classifier. To deal with local data uncertainty, we introduce a confidence value to each data example in the training data, which measures the strength of the corresponding class label. Our proposed method works in two steps. Firstly, we generate a pseudo training dataset by computing a confidence value of each input example on its class label. We present two different mechanisms: kernel k-means clustering algorithm and kernel LOF-based algorithm, to compute the confidence values based on the local data behavior. Secondly, we construct a global classifier for outlier detection by generalizing the SVDD-based learning framework to incorporate both positive and negative examples as well as their associated confidence values. By integrating local and global outlier detection, our proposed method explicitly handles the uncertainty of the input data and enhances the ability of SVDD in reducing the sensitivity to noise. Extensive experiments on real life datasets demonstrate that our proposed method can achieve a better tradeoff between detection rate and false alarm rate as compared to four state-of-the-art outlier detection algorithms.

international conference on data mining | 2010

SMILE: A Similarity-Based Approach for Multiple Instance Learning

Yanshan Xiao; Bo Liu; Longbing Cao; Jie Yin; Xindong Wu

Multiple instance learning (MIL) is a generalization of supervised learning which attempts to learn useful information from bags of instances. In MIL, the true labels of the instances in positive bags are not always available for training. This leads to a critical challenge, namely, handling the ambiguity of instance labels in positive bags. To address this issue, this paper proposes a novel MIL method named SMILE (Similarity-based Multiple Instance LEarning). It introduces a similarity weight to each instance in positive bag, which represents the instance similarity towards the positive and negative classes. The instances in positive bags, together with their similarity weights, are thereafter incorporated into the learning phase to build an extended SVM-based predictive classifier. Experiments on three real-world datasets consisting of 12 subsets show that SMILE achieves markedly better classification accuracy than state-of-the-art MIL methods.

Knowledge and Information Systems | 2014

An efficient orientation distance–based discriminative feature extraction method for multi-classification

Bo Liu; Yanshan Xiao; Philip S. Yu; Zhifeng Hao; Longbing Cao

Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.

Explore More