Yuefei Sui
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuefei Sui.
Expert Systems With Applications | 2010
Feng Jiang; Yuefei Sui; Cungen Cao
The information entropy in information theory, developed by Shannon, gives an effective measure of uncertainty for a given system. And it also seems a competing mechanism for the measurement of uncertainty in rough sets. Many researchers have applied the information entropy to rough sets, and proposed different information entropy models in rough sets. Especially, Duntsch et al. presented a well-justified information entropy model for the measurement of uncertainty in rough sets. In this paper, we shall demonstrate the application of this model for the study of a specific data mining problem - outlier detection. By virtue of Duntschs information entropy model, we propose a novel definition of outliers -IE (information entropy)-based outliers in rough sets. An algorithm to find such outliers is also given. And the effectiveness of IE-based method for outlier detection is demonstrated on two publicly available data sets.
International Journal of General Systems | 2008
Feng Jiang; Yuefei Sui; Cungen Cao
“One persons noise is another persons signal” (Knorr and Ng 1998). In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers—objects who behave in an unexpected way or have abnormal properties. Detecting such outliers is important for many applications such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations. In this paper, we suggest to exploit the framework of rough sets for detecting outliers. We propose a novel definition of outliers—RMF (rough membership function)-based outliers, by virtue of the notion of rough membership function in rough set theory. An algorithm to find such outliers is also given. And the effectiveness of RMF-based method is demonstrated on two publicly available data sets.
Knowledge Based Systems | 2015
Feng Jiang; Yuefei Sui
Discretization of continuous attributes is an important task in rough sets and many discretization algorithms have been proposed. However, most of the current discretization algorithms are univariate, which may reduce the classification ability of a given decision table. To solve this problem, we propose a supervised and multivariate discretization algorithm — SMDNS in rough sets, which is derived from the traditional algorithm naive scaler (called Naive). Given a decision table DT=(U,C,D,V,f), since SMDNS uses both class information and the interdependence among various condition attributes in C to determine the discretization scheme, the cuts obtained by SMDNS are much less than those obtained by Naive, while the classification ability of DT remains unchanged after discretization. Experimental results show that SMDNS is efficient in terms of the classification accuracy and the number of generated cuts. In particular, our algorithm can obtain a satisfactory compromise between the number of cuts and the classification accuracy.
granular computing | 2005
Feng Jiang; Yuefei Sui; Cungen Cao
In this paper, we suggest to exploit the framework of rough set for detecting outliers — individuals who behave in an unexpected way or feature abnormal properties. The ability to locate outliers can help to maintain knowledge base integrity and to single out irregular individuals. First, we formally define the notions of exceptional set and minimal exceptional set. We then analyze some special cases of exceptional set and minimal exceptional set. Finally, we introduce a new definition for outliers as well as the definition of exceptional degree. Through calculating the exceptional degree for each object in minimal exceptional sets, we can find out all outliers in a given dataset.
Artificial Intelligence Review | 2013
Feng Jiang; Yuefei Sui; Cungen Cao
As we know, learning in real world is interactive, incremental and dynamical in multiple dimensions, where new data could be appeared at anytime from anywhere and of any type. Therefore, incremental learning is of more and more importance in real world data mining scenarios. Decision trees, due to their characteristics, have been widely used for incremental learning. In this paper, we propose a novel incremental decision tree algorithm based on rough set theory. To improve the computation efficiency of our algorithm, when a new instance arrives, according to the given decision tree adaptation strategies, the algorithm will only modify some existing leaf node in the currently active decision tree or add a new leaf node to the tree, which can avoid the high time complexity of the traditional incremental methods for rebuilding decision trees too many times. Moreover, the rough set based attribute reduction method is used to filter out the redundant attributes from the original set of attributes. And we adopt the two basic notions of rough sets: significance of attributes and dependency of attributes, as the heuristic information for the selection of splitting attributes. Finally, we apply the proposed algorithm to intrusion detection. The experimental results demonstrate that our algorithm can provide competitive solutions to incremental learning.
Theoretical Computer Science | 2006
Zaiyue Zhang; Yuefei Sui; Cungen Cao; Guohua Wu
We establish in this paper a fuzzy propositional modal logic, FPML, and the associated semantics, fuzzy Kripke semantics. We prove that FPML is sound and complete. Furthermore, we set up a formalized reasoning mechanism based on FPML.
Information Sciences | 2016
Feng Jiang; Guozhu Liu; Junwei Du; Yuefei Sui
We considered the initialization of K-modes clustering from the view of outlier detection.We proposed an initialization algorithm for K-modes clustering via the distance-based outlier detection technique.We presented a partition entropy-based outlier detection technique, and designed an initialization algorithm via it.We proposed a new distance metric - weighted matching distance metric.The effectiveness of our initialization algorithms was shown on several UCI data sets. The K-modes clustering has received much attention, since it works well for categorical data sets. However, the performance of K-modes clustering is especially sensitive to the selection of initial cluster centers. Therefore, choosing the proper initial cluster centers is a key step for K-modes clustering. In this paper, we consider the initialization of K-modes clustering from the view of outlier detection. We present two different initialization algorithms for K-modes clustering, where the first is based on the traditional distance-based outlier detection technique, and the second is based on the partition entropy-based outlier detection technique. By using the above two outlier detection techniques to calculate the degree of outlierness of each object, our algorithms can guarantee that the chosen initial cluster centers are not outliers. Moreover, during the process of initialization, we adopt a new distance metric - weighted matching distance metric, to calculate the distance between two objects described by categorical attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our initialization algorithms for K-modes clustering.
RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing | 2006
Feng Jiang; Yuefei Sui; Cungen Cao
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers — individuals who behave in an unexpected way or have abnormal properties. Outlier detection is critically important in the information-based society. In this paper, we propose a new definition for outliers in rough set theory which exploits the rough membership function. An algorithm to find such outliers in rough set theory is also given. The effectiveness of our method for outlier detection is demonstrated on two publicly available databases.
granular computing | 2003
Yuefei Sui; Youming Xia; Ju Wang
Beaubouef, Petry and Buckles proposed the generalized rough set database analysis (GRSDA) to discuss rough relational databases. Given any rough relational database (U, A) and an attribute a ∈ A, as in rough set theory, a definition of the lower and upper approximations based on φa is given. The entropy and conditional entropy of similarity relations in a rough relational database are defined. The examples show that the entropy of a similarity relation does not decrease as the similarity relation is refined. It will be proved that given any two similarity relations φ and ψ, defined by a set C of conditional attributes and a decision attribute d, respectively, if d similarly depends on C in a rough relational database then the conditional entropy of φ with respect to ψ is equal to the entropy of φ.
Pattern Recognition Letters | 2011
Feng Jiang; Yuefei Sui; Cungen Cao
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers - objects who behave in an unexpected way or have abnormal properties. The identification of outliers is important for many applications such as intrusion detection, credit card fraud, criminal activities in electronic commerce, medical diagnosis and anti-terrorism, etc. In this paper, we propose a hybrid approach to outlier detection, which combines the opinions from boundary-based and distance-based methods for outlier detection (Jiang et al., 2005, 2009; Knorr and Ng, 1998). We give a novel definition of outliers -BD (boundary and distance)-based outliers, by virtue of the notion of boundary region in rough set theory and the definitions of distance-based outliers. An algorithm to find such outliers is also given. And the effectiveness of our method for outlier detection is demonstrated on two publicly available databases.