Is this you? Create Your Porfile

Changzhen Hu

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Changzhen Hu is active.

Explore More

Publication

Featured researches published by Changzhen Hu.

international conference on machine learning and cybernetics | 2010

An algorithm for clustering heterogeneous data streams with uncertainty

Guo-Yan Huang; Dapeng Liang; Changzhen Hu; Jia-Dong Ren

In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

fuzzy systems and knowledge discovery | 2009

Efficient Outlier Detection Algorithm for Heterogeneous Data Streams

Jiadong Ren; Qunhui Wu; Jia Zhang; Changzhen Hu

Data streams outlier mining is an important and active research issue in anomaly detection. Most of the existing outlier detection algorithms can only manipulate numeric attributes or categorical attributes. In this paper, we propose an efficient outlier detection algorithm based on heterogeneous data streams, which partitions the stream in chunks. Then each chunk is clustered and the corresponding clustering results are stored in cluster references. The representation degree and the number of adjacent cluster references of each cluster reference are computed to generate the final outlier references, which include potential outliers. Experimental results show that our approach has higher detection precision and better scalability.

international conference on machine learning and cybernetics | 2010

Mining frequent itemsets based on projection array

Hai-Tao He; Hai-Yan Cao; Ruixia Yao; Jia-Dong Ren; Changzhen Hu

Frequent itemsets mining is a crucial problem in the field of data mining. Although many related studies have been suggested, these algorithms may suffer from high computation cost and spatial complexity in dense database, especially when mining long frequent itemsets or support threshold is very lower. To address this problem, a new data structure called P Array is proposed. P Array makes use of data horizontally and vertically like Bit Table FI, and those itemsets that co_occurence with single frequent items are found by computing intersection in P Array. Then, a new algorithm, call MFIPA, is proposed based on P Array. Some frequent itemsets which have the same supports as single frequent item can be found firstly by connecting the single frequent item with every nonempty subsets of its projection, then all other frequent itemsets can be found by using depth-first search strategy. The experimental results show that the proposed algorithm is superior to Bit Table FI in execution efficiency and memory requirement, especially for dense database.

international conference on machine learning and cybernetics | 2010

Fast discovery of frequent closed sequential patterns based on positional data

Guo-Yan Huang; Fei Yang; Changzhen Hu; Jia-Dong Ren

Frequent closed sequential patterns mining is one of the hot topics in data mining. In this paper, a novel frequent closed sequential pattern mining algorithm, FCSM-PD (frequent closed sequential pattern mining algorithm based on positional data) is proposed, which is the improved BIDE algorithm based on the positional data. The positional data is used to reserve the position information of items in the algorithm, By storing all the position information of the prefix sequences in advance, the verifying about the existence of extension of position with a prefix sequence can be easily implemented by scanning the position information of the prefix sequence, rather than scanning the pseudo-projected database repeatedly in the BI-Directional Extension closure checking scheme, which is the most consumed time phase in the algorithm of BIDE. Meanwhile optimization strategy is applied to reduce the time and memory cost in the mining process. The experimental results show that FCSM-PD costs significantly lower running time than BIDE, especially in the intensive database.

international conference on innovative computing, information and control | 2009

CABGD: An Improved Clustering Algorithm Based on Grid-Density

Lili Meng; Jiadong Ren; Changzhen Hu

In data mining fields, clustering is an important issue. Compared with other algorithms, grid-based algorithms generally have a fast processing time. However, since the size of a cell is determined by users, the large size cell may contain data points of different clusters and leads to low clustering quality. In this paper, we propose an improved clustering algorithm based on grid-density (CABGD). The concept of center intensity of grid cell is presented and is applied to identify the distribution of data points in a grid and to decide whether or not to split the grid. Then all density-connected grids are assigned to a cluster. Experimental results on synthetic datasets show that the algorithm has higher clustering accuracy and lower sensitivity to parameters.

international conference on innovative computing, information and control | 2009

A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams

Jiadong Ren; Lining Li; Changzhen Hu

Clustering is a significant and difficult problem in data stream mining due to a mass of streaming data arriving continuously. High-dimensional data streams make clustering analysis more complex because of the sparsity of data. In this paper, we propose a new clustering method for highdimensional data streams, called WSCStream. The method incorporates a fading cluster structure and a dimensional weight matrix. We assign a weight to each dimension of corresponding cluster in the matrix. The weight associated with each dimension indicates the importance of each dimension to the corresponding cluster. The weighted distance between a cluster and a data point is used to obtain the final clusters as the new data points arrive over time. Experimental results on real and synthetic datasets demonstrate that WSCStream has higher clustering quality than PHStream.

international conference on innovative computing, information and control | 2009

An Improved Storage Algorithm for Multidimensional Data Cube

Haitao He; Yanpeng Zhang; Lining Li; Jiadong Ren; Changzhen Hu

There are 2n views in a n-dimensional data cube, the more the number of views, the more the maintenance time of data cube will be. According to the hierarchical feature of dimension in data cube, An Improved Storage Algorithm For Multidimensional Data Cube (ISMDC) is proposed in this paper. Dimensions are divided into association dimensions and no-association dimensions. The conception of Association Tree Cube is brought forward. On the no-association dimension, hierarchical B+ tree is used to remove redundancy and to form dimension hierarchical encoding. On the association dimension, encoding of the no-association dimension which is composed of dimension hierarchical encodings, is used for indexing. Thus, the value of aggregation will be searched out effectively. The experimental results show that ISMDC has reduced the storage requirement, maintenance time and improved the efficiency of data cube pattern update and the OLAP queries.

artificial intelligence and computational intelligence | 2009

An Approach for Analyzing Infrequent Software Faults Based on Outlier Detection

Jiadong Ren; Qunhui Wu; Changzhen Hu; Kunsheng Wang

The fault analysis is critical process in software security system. However, identifying outliers in software faults has not been well addressed. In this paper, we define WCFPOF (weighted closed frequent pattern outlier factor) to measure the complete transactions, and propose a novel approach for detecting closed frequent pattern based outliers. Through discovering and maintaining closed frequent patterns, the outlier measure of each transaction is computed to generate outliers. The outliers are the data that contain relatively less closed frequent itemsets. To describe the reasons why detected outlier transactions are infrequent, the contradictive closed frequent patterns for each outlier are figured out. Experimental results show that our algorithm has shorter time consumption and better scalability.

fuzzy systems and knowledge discovery | 2009

An Improved OLAP Join and Aggregate Algorithm Based on Dimension Hierarchy

Haitao He; Yanpeng Zhang; Jiadong Ren; Changzhen Hu

The OLAP (online analytical processing) queries are always involved with queries on the massive dataset. As a result, how to perform multi-table join and aggregate operations becomes the key issue. A Join and Aggregate Algorithm Based on Dimension Hierarchy (JABDH) is proposed in this paper. Considering the semantic characteristic which is not in all the dimension hierarchies, dimension hierarchical encoding is used to retrieve the matching dimension hierarchies and evaluate the set of query ranges for semantic dimension hierarchies. To improve the efficiency of multi-table join and aggregate operations for non-semantic dimensional hierarchies, join and aggregate operations are translated into bitmapped join index of fact table. The performance analysis and experimental results show that JABDH has improved the speed of queries and the efficiency of the OLAP queries.

Archive | 2010