Is this you? Create Your Porfile

Xiaofei Xu

Harbin Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaofei Xu is active.

Explore More

Publication

Featured researches published by Xiaofei Xu.

Pattern Recognition Letters | 2003

Discovering cluster-based local outliers

Zengyou He; Xiaofei Xu; Shengchun Deng

In this paper, we present a new definition for outlier: cluster-based local outlier, which is meaningful and provides importance to the local data behavior. A measure for identifying the physical significance of an outlier is designed, which is called cluster-based local outlier factor (CBLOF). We also propose the FindCBLOF algorithm for discovering outliers. The experimental results show that our approach outperformed the existing methods on identifying meaningful and interesting outliers.

international conference on intelligent computing | 2005

An optimization model for outlier detection in categorical data

Zengyou He; Shengchun Deng; Xiaofei Xu

In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.

knowledge discovery and data mining | 2006

A fast greedy algorithm for outlier mining

Zengyou He; Shengchun Deng; Xiaofei Xu; Joshua Zhexue Huang

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

Information Fusion | 2005

A cluster ensemble method for clustering categorical data

Zengyou He; Xiaofei Xu; Shengchun Deng

Categorical data clustering (CDC) and cluster ensemble (CE) have long been considered as separate research and application areas. The main focus of this paper is to investigate the commonalities between these two problems and the uses of these commonalities for the creation of new clustering algorithms for categorical data based on cross-fertilization between the two disjoint research fields. More precisely, we formally define the CDC problem as an optimization problem from the viewpoint of CE, and apply CE approach for clustering categorical data. Experimental results on real datasets show that CE based clustering method is competitive with existing CDC algorithms with respect to clustering accuracy.

Expert Systems With Applications | 2004

Mining class outliers: concepts, algorithms and applications in CRM

Zengyou He; Xiaofei Xu; Joshua Zhexue Huang; Shengchun Deng

Abstract Outliers, or commonly referred to as exceptional cases, exist in many real-world databases. Detection of such outliers is important for many applications and has attracted much attention from the data mining research community recently. However, most existing methods are designed for mining outliers from a single dataset without considering the class labels of data objects. In this paper, we consider the class outlier detection problem ‘given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels’. By generalizing two pioneer contributions [Proc WAIM02 (2002); Proc SSTD03] in this field, we develop the notion of class outlier and propose practical solutions by extending existing outlier detection algorithms to this case. Furthermore, its potential applications in CRM (customer relationship management) are also discussed. Finally, the experiments in real datasets show that our method can find interesting outliers and is of practical use.

Expert Systems With Applications | 2005

Mining action rules from scratch

Zengyou He; Xiaofei Xu; Shengchun Deng; Ronghua Ma

Action rules provide hints to a business user what actions (i.e. changes within some values of flexible attributes) should be taken to improve the profitability of customers. That is, taking some actions to re-classify some customers from less desired decision class to the more desired one. However, in previous work, each action rule was constructed from two rules, extracted earlier, defining different profitability classes. In this paper, we make a first step towards formally introducing the problem of mining action rules from scratch and present formal definitions. In contrast to previous work, our formulation provides guarantee on verifying completeness and correctness of discovered action rules. In addition to formulating the problem from an inductive learning viewpoint, we provide theoretical analysis on the complexities of the problem and its variations. Furthermore, we present efficient algorithms for mining action rules from scratch. In an experimental study we demonstrate the usefulness of our techniques.

computational intelligence and security | 2005

Improving k-modes algorithm considering frequencies of attribute values in mode

Zengyou He; Shengchun Deng; Xiaofei Xu

In this paper, we present an experimental study on applying a new dissimilarity measure to the k-modes clustering algorithm to improve its clustering accuracy. The measure is based on the idea that the similarity between a data object and cluster mode, is directly proportional to the sum of relative frequencies of the common values in mode. Experimental results on real life datasets show that, the modified algorithm is superior to the original k-modes algorithm with respect to clustering accuracy.

web age information management | 2004

A Frequent Pattern Discovery Method for Outlier Detection

Zengyou He; Xiaofei Xu; Joshua Zhexue Huang; Shengchun Deng

An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions which contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results show that our approach outperformed the existing methods on identifying interesting outliers.

International Journal of Intelligent Systems | 2005

Scalable algorithms for clustering large datasets with mixed type attributes

Zengyou He; Xiaofei Xu; Shengchun Deng

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this article, we present two algorithms that extend the Squeezer algorithm to domains with mixed numeric and categorical attributes. The performance of the two algorithms has been studied on real and artificially generated datasets. Comparisons with other clustering algorithms illustrate the superiority of our approaches.

web age information management | 2002

Outlier Detection Integrating Semantic Knowledge

Zengyou He; Shengchun Deng; Xiaofei Xu

Existing proposals on outlier detection didnt take the semantic knowledge of the dataset into consideration. They only tried to find outliers from dataset itself, which prevents finding more meaningful outliers. In this paper, we consider the problem of outlier detection integrating semantic knowledge. We introduce new definition for outlier: semantic outlier. A semantic outlier is a data point, which behaves differently with other data points in the same class. A measure for identifying the degree of each object being an outlier is presented, which is called semantic outlier factor (SOF). An efficient algorithm for mining semantic outliers based on SOF is also proposed. Experimental results show that meaningful and interesting outliers can be found with our method.

Explore More