Is this you? Create Your Porfile

Yuejin Yan

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuejin Yan is active.

Explore More

Publication

Featured researches published by Yuejin Yan.

international conference of fuzzy information and engineering | 2007

A Survey of Fuzzy Decision Tree Classifier Methodology

Tao Wang; Zhoujun Li; Yuejin Yan; Huowang Chen

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Over the years, additional methodologies have been investigated and proposed to deal with continuous or multi-valued data, and with missing or noisy features. Recently, with the growing popularity of fuzzy representation, some researchers have proposed to utilize fuzzy representation in decision trees to deal with similar situations. This paper presents a survey of current methods for FDT(Fuzzy Decision Tree)designs and the various existing issues. After considering potential advantages of FDT‘s over traditional decision tree classifiers, the subjects of FDT attribute selection criteria, inference for decision assignment, and decision and stopping criteria are discussed. To be best of our knowledge, this is the first overview of fuzzy decision tree classifier.

machine learning and data mining in pattern recognition | 2007

An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams

Tao Wang; Zhoujun Li; Yuejin Yan; Huowang Chen

One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system fVFDT on top of VFDT and VFDTc. We make the following four contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT`s processing time is O(n2). When a new example arrives, VFDTc need update O(logn)attribute tree nodes, but fVFDT just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves fromO(nlogn)to O (n)in processing time. 3) Comparing to VFDTc, fVFDT`s candidate split-test number decrease fromO(n)to O(logn).4)Improve the soft discretization method to be used in data streams mining, it overcomes the problem of noise data and improve the classification accuracy.

international conference on systems | 2007

An Efficient Classification System Based on Binary Search Trees for Data Streams Mining

Tao Wang; Zhoujun Li; Yuejin Yan; Huowang Chen; JinShan Yu

international conference on emerging technologies | 2007

A new decision tree classification method for mining high-speed data streams based on threaded binary search trees

Tao Wang; Zhoujun Li; Xiaohua Hu; Yuejin Yan; Huowang Chen

international conference on conceptual modeling | 2004

Fast Mining Maximal Frequent ItemSets Based on FP-Tree

Yuejin Yan; Zhoujun Li; Huowang Chen

sup2

machine learning and data mining in pattern recognition | 2007

Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree

Fujiang Ao; Yuejin Yan; Jian Huang; Kedi Huang

esup). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTts candidate split-test number decrease from O(n) to O(logn). Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics | 2007

A new fuzzy decision tree classification method for mining high-speed data streams based on binary search trees

Zhoujun Li; Tao Wang; Ruoxue Wang; Yuejin Yan; Huowang Chen

Maximal frequent itemsets mining is a fundamental and important problem in many data mining applications. Since the MaxMiner algorithm introduced the enumeration trees for MFI mining in 1998, there have been several methods proposed to use depth-first search to improve performance. This paper presents FIMfi, a new depth-first algorithm based on FP-tree and MFI-tree for mining MFI. FIMfi adopts a novel item ordering policy for efficient lookaheads pruning, and a simple method for fast superset checking. It uses a variety of old and new pruning techniques to prune the search space. Experimental comparison with previous work reveals that FIMfi reduces the number of FP-trees created greatly and is more than 40% superior to the similar algorithms on average.

australasian joint conference on artificial intelligence | 2004

Mining maximal frequent itemsets using combined FP-Tree

Yuejin Yan; Zhoujun Li; Tao Wang; Yuexin Chen; Huowang Chen

Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. The experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.

fuzzy systems and knowledge discovery | 2007

A Novel Pruning Technique for Mining Maximal Frequent Itemsets

Fujiang Ao; Yuejin Yan; Jian Huang; Kedi Huang

Decision tree construction is a well-studied problem in data mining. Recently, there has been much interest in mining data streams. Domingos and Hulten have presented a one-pass algorithm for decision tree constructions. Their system using Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. Peng et al. present soft discretization method to solve continuous attributes in data mining. In this paper, we revisit these problems and implemented a system sVFDT for data stream mining. We make the following contributions: 1) we present a binary search trees (BST) approach for efficiently handling continuous attributes. Its processing time for values inserting is O(nlogn), while VFDTs processing time is O(n2). 2) We improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it decreases from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, sVFDTs candidate split-test number decrease from O(n) to O(logn).4)Improve the soft discretization method to increase classification accuracy in data stream mining.

computer and information technology | 2008

An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams

Fujiang Ao; Jing Du; Yuejin Yan; Baohong Liu; Kedi Huang

Maximal frequent itemsets mining is one of the most fundamental problems in data mining In this paper, we present CfpMfi, a new depth-first search algorithm based on CFP-tree for mining MFI Based on the new data structure CFP-tree, which is a combination of FP-tree and MFI-tree, CfpMfi takes a variety pruning techniques and a novel item ordering policy to reduce the search space efficiently Experimental comparison with previous work reveals that, on dense datasets, CfpMfi prunes the search space efficiently and is better than other MFI Mining algorithms on dense datasets, and uses less main memory than similar algorithm.

Explore More