Jia-Dong Ren
Yanshan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jia-Dong Ren.
international conference on machine learning and cybernetics | 2010
Guo-Yan Huang; Dapeng Liang; Changzhen Hu; Jia-Dong Ren
In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.
international conference on machine learning and cybernetics | 2010
Hai-Tao He; Hai-Yan Cao; Ruixia Yao; Jia-Dong Ren; Changzhen Hu
Frequent itemsets mining is a crucial problem in the field of data mining. Although many related studies have been suggested, these algorithms may suffer from high computation cost and spatial complexity in dense database, especially when mining long frequent itemsets or support threshold is very lower. To address this problem, a new data structure called P Array is proposed. P Array makes use of data horizontally and vertically like Bit Table FI, and those itemsets that co_occurence with single frequent items are found by computing intersection in P Array. Then, a new algorithm, call MFIPA, is proposed based on P Array. Some frequent itemsets which have the same supports as single frequent item can be found firstly by connecting the single frequent item with every nonempty subsets of its projection, then all other frequent itemsets can be found by using depth-first search strategy. The experimental results show that the proposed algorithm is superior to Bit Table FI in execution efficiency and memory requirement, especially for dense database.
international conference on machine learning and cybernetics | 2010
Guo-Yan Huang; Fei Yang; Changzhen Hu; Jia-Dong Ren
Frequent closed sequential patterns mining is one of the hot topics in data mining. In this paper, a novel frequent closed sequential pattern mining algorithm, FCSM-PD (frequent closed sequential pattern mining algorithm based on positional data) is proposed, which is the improved BIDE algorithm based on the positional data. The positional data is used to reserve the position information of items in the algorithm, By storing all the position information of the prefix sequences in advance, the verifying about the existence of extension of position with a prefix sequence can be easily implemented by scanning the position information of the prefix sequence, rather than scanning the pseudo-projected database repeatedly in the BI-Directional Extension closure checking scheme, which is the most consumed time phase in the algorithm of BIDE. Meanwhile optimization strategy is applied to reduce the time and memory cost in the mining process. The experimental results show that FCSM-PD costs significantly lower running time than BIDE, especially in the intensive database.
international conference on machine learning and cybernetics | 2004
Jia-Dong Ren; Yin-Bo Cheng; Liang-Liang Yang
Sequential pattern mining is an important data mining problem with broad applications. Algorithm GSP discovers generalized sequential patterns. However, GSP still encounters problems when a sequence database is large and/or when sequential patterns to be mined are long. Algorithm PrefixSpan mines complete sequential patterns faster than GSP but it cannot mine generalized sequential patterns with time constraints, time windows and/or taxonomy. In this paper, a new enhanced method based on PrefixSpan, is proposed, called EPSpan, which absorbs the spirit of PrefixSpan and extends PrefixSpan towards mining generalized sequential patterns.
international conference on machine learning and cybernetics | 2003
Jia-Dong Ren; Jie Bao; Hui-Yu Huang
In this paper we present a moving spatio-temporal data model based on the spatio-temporal data types, utilizing the benefits of object-relational database systems. In particular, we emphasize on the aspect of data mining in such a data model. The task of spatio-temporal data mining, such as a series of spatio-temporal rules and similar trajectories analysis, are discussed. We also discuss some techniques that can be adopted to improve the performance of spatio-temporal data mining.
international conference on machine learning and cybernetics | 2006
Jia-Dong Ren; Jun-Sheng Zong
The sequential pattern mining algorithm discovers all patterns meeting the user specified minimum support threshold. However, it is very impossibly that user could obtain the satisfactory patterns in just one query. The paper proposes a new interactive sequential pattern mining algorithm based on memory indexing, named MIFSPM, which adopts memory indexing technique, so it scans the sequence database only once to read data sequences into memory. Compact lattice frequent pattern tree (abbreviated as LFP-tree) saves previous results, in which the root node saves two minimum support thresholds. Besides, each node does not store frequent patterns and support information, but also index set mapped table (abbreviated as ISMT), except the root node. Rapidly, ISMT is used to mine new frequent sequential patterns without candidates generation. When to update the structure is decided by comparing the two minimum support thresholds, logistic information contained in the index set mapped table is used to fast mine new frequent sequential patterns without candidates generation. Experiments demonstrate the good performance and scalability of MIFSPM, with various minimum support thresholds. Therefore, MIFSPM can mine frequent sequential patterns efficiently and be better than the other algorithms
international conference on machine learning and cybernetics | 2005
Jia-Dong Ren; Xiao-Lei Zhou
Mining of sequential patterns is an important issue among the various data mining problems. The problem of incremental mining of sequential patterns deserves as much attention. In this paper, we consider the problem of the incremental updating of sequential pattern mining when some transactions and/or data sequences are deleted from the original sequence database. We present a new algorithm, called IU_D, for mining frequent sequences so as to make full use of information obtained during an earlier mining process for reducing the cost of finding new sequential patterns in the updated database. The results of our experiment show that the algorithm performs significantly faster than the naive approach of mining the entire updated database from scratch.
international conference on machine learning and cybernetics | 2003
Jia-Dong Ren; Hui-Yu Huang; Jie Bao
Generalization ability of the network and training time are the two important aspects that we must consider when we design the neural network algorithms. At the same time, the optimization of neural network architecture must be considered in each artificial neural network based on the BP algorithm. But to the larger networks, there are no more suitable ways to solve this problem. This paper proposes an especial two-hidden-layer artificial neural network. After describing the major steps of this algorithm, some experimental results and analysis are given out. Those experimental results indicate that the generalization ability, training time and the architecture optimization of the networks have been improved obviously in this algorithm.
international conference on machine learning and cybernetics | 2009
Guo-Yan Huang; Libo Wang; Changzhen Hu; Jia-Dong Ren; Hui-Ling He
Mining maximal frequent itemsets is an active research area in data stream mining. A new algorithm, called MFI-TD (mine maximal frequent itemsets based on time decay model) is proposed for mining maximum frequent itemsets. A new data structure, called PW-tree ( Point based Window-tree ) is introduced to store each transaction for the current window, and the final node of the path which denotes a maximum frequent itemset is pointed by the DP ( domain pointer). Then according to the data structure, the MFI-TD gradually reduces the weight of historical transaction supporting number, and deletes the obsolete and infrequent itemset branches in PW-tree by using of time decay model. Thus MFI-TD decreases the space complexity and reduces maintenance cost of PW-tree. Experimental results show that MFI-TD has better space efficiency and result accuracy than DSM-MFI algorithm.
international symposium on data privacy and e commerce | 2007
Jia-Dong Ren; Wanchang Jiang; Cong Huo
Two novel sampling approaches are proposed to obtain a random sample of exact streaming window join result. Without assuming any model of stream arrivals, the frequency of join attribute values for various basic periods can be obtained by a frequency balanced binary tree histogram (FATH) which is constructed for each stream. The frequency for the future window can be computed by linear regression with the help of the information in the FATH. With the random sample of exact join result produced, a windowed aggregate over the exact join results can be unbiasedly and accurately estimated. Experimental results show that our approach is more efficient than other approach for arbitrary streams.