Guojun Gan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guojun Gan is active.

Explore More

Publication

Featured researches published by Guojun Gan.

Expert Systems With Applications | 2009

A genetic fuzzy k-Modes algorithm for clustering categorical data

Guojun Gan; Jianhong Wu; Zijiang Yang

The fuzzy k-Modes algorithm introduced by Huang and Ng [Huang, Z., & Ng, M. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4), 446-452] is very effective for identifying cluster structures from categorical data sets. However, the algorithm may stop at locally optimal solutions. In order to search for appropriate fuzzy membership matrices which can minimize the fuzzy objective function, we present a hybrid genetic fuzzy k-Modes algorithm in this paper. To circumvent the expensive crossover operator in genetic algorithms (GAs), we hybridize GA with the fuzzy k-Modes algorithm and define the crossover operator as a one-step fuzzy k-Modes algorithm. Experiments on two real data sets are carried out to illustrate the performance of the proposed algorithm.

Pattern Recognition | 2008

A convergence theorem for the fuzzy subspace clustering (FSC) algorithm

Guojun Gan; Jianhong Wu

We establish the convergence of the fuzzy subspace clustering (FSC) algorithm by applying Zangwills convergence theorem. We show that the iteration sequence produced by the FSC algorithm terminates at a point in the solution set S or there is a subsequence converging to a point in S. In addition, we present experimental results that illustrate the convergence properties of the FSC algorithm in various scenarios.

Sigkdd Explorations | 2004

Subspace clustering for high dimensional categorical data

Guojun Gan; Jianhong Wu

Data clustering has been discussed extensively, but almost all known conventional clustering algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the data points. Existing subspace clustering algorithms for handling high-dimensional data focus on numerical dimensions. In this paper, we designed an iterative algorithm called SUBCAD for clustering high dimensional categorical data sets, based on the minimization of an objective function for clustering. We deduced some cluster memberships changing rules using the objective function. We also designed an objective function to determine the subspace associated with each cluster. We proved various properties of this objective function that are essential for us to design a fast algorithm to find the subspace associated with each cluster. Finally, we carried out some experiments to show the effectiveness of the proposed method and the algorithm.

advanced data mining and applications | 2005

A genetic k -modes algorithm for clustering categorical data

Guojun Gan; Zijiang Yang; Jianhong Wu

Many optimization based clustering algorithms suffer from the possibility of stopping at locally optimal partitions of data sets. In this paper, we present a genetic k-Modes algorithm(GKMODE) that finds a globally optimal partition of a given categorical data set into a specified number of clusters. We introduce a k-Modes operator in place of the normal crossover operator. Our analysis shows that the clustering results produced by GKMODE are very high in accuracy and it performs much better than existing algorithms for clustering categorical data.

advanced data mining and applications | 2006

A fuzzy subspace algorithm for clustering high dimensional data

Guojun Gan; Jianhong Wu; Zijiang Yang

In fuzzy clustering algorithms each object has a fuzzy membership associated with each cluster indicating the degree of association of the object to the cluster. Here we present a fuzzy subspace clustering algorithm, FSC, in which each dimension has a weight associated with each cluster indicating the degree of importance of the dimension to the cluster. Using fuzzy techniques for subspace clustering, our algorithm avoids the difficulty of choosing appropriate cluster dimensions for each cluster during the iterations. Our analysis and simulations strongly show that FSC is very efficient and the clustering results produced by FSC are very high in accuracy.

Insurance Mathematics & Economics | 2013

Application of Data Clustering and Machine Learning in Variable Annuity Valuation

Guojun Gan

The valuation of variable annuity guarantees has been studied extensively in the past four decades. However, almost all the studies focus on the valuation of guarantees embedded in a single variable annuity contract. How to efficiently price the guarantees for a large portfolio of variable annuity contracts has not received enough attention. This paper fills the gap by introducing a novel method based on data clustering and machine learning to price the guarantees for a large portfolio of variable annuity contracts. Our test results show that this method performs very well in terms of accuracy and speed.

Pattern Recognition | 2015

Subspace clustering using affinity propagation

Guojun Gan; Michael K. Ng

This paper proposes a subspace clustering algorithm by introducing attribute weights in the affinity propagation algorithm. A new step is introduced to the affinity propagation process to iteratively update the attribute weights based on the current partition of the data. The relative magnitude of the attribute weights can be used to identify the subspaces in which clusters are embedded. Experiments on both synthetic data and real data show that the new algorithm outperforms the affinity propagation algorithm in recovering clusters from data. HighlightsWe study the problem of subspace clustering.We propose an algorithm by combining affinity propagation and attribute weighting.The algorithm does not su er from the cluster center initialization problem.Experiments on synthetic and real data show that the algorithm performs well.

Insurance Mathematics & Economics | 2015

Valuation of large variable annuity portfolios under nested simulation: A functional data approach

Guojun Gan; X. Sheldon Lin

A variable annuity (VA) is equity-linked annuity product that has rapidly grown in popularity around the world in recent years. Research up to date on VA largely focuses on the valuation of guarantees embedded in a single VA contract. However, methods developed for individual VA contracts based on option pricing theory cannot be extended to large VA portfolios. Insurance companies currently use nested simulation to valuate guarantees for VA portfolios but efficient valuation under nested simulation for a large VA portfolio has been a real challenge. The computation in nested simulation is highly intensive and often prohibitive. In this paper, we propose a novel approach that combines a clustering technique with a functional data analysis technique to address the issue. We create a highly non-homogeneous synthetic VA portfolio of 100,000 contracts and use it to estimate the dollar Delta of the portfolio at each time step of outer loop scenarios under the nested simulation framework over a period of 25 years. Our test results show that the proposed approach performs well in terms of accuracy and efficiency.

Pattern Recognition | 2015

Subspace clustering with automatic feature grouping

Guojun Gan; Michael K. Ng

This paper proposes a subspace clustering algorithm with automatic feature grouping for clustering high-dimensional data. In this algorithm, a new component is introduced into the objective function to capture the feature groups and a new iterative process is defined to optimize the objective function so that the features of high-dimensional data are grouped automatically. Experiments on both synthetic data and real data show that the new algorithm outperforms the FG-k-means algorithm in terms of accuracy and choice of parameters. HighlightsWe study the problem of subspace clustering with feature grouping.We propose a k-means-type algorithm by incorporating feature grouping into the objective function.The algorithm is able to determine feature groups automatically.Experiments on synthetic and real data show that the algorithm performs well.

international joint conference on neural network | 2006

PARTCAT: A Subspace Clustering Algorithm for High Dimensional Categorical Data

Guojun Gan; Jianhong Wu; Zijiang Yang

A new subspace clustering algorithm, PARTCAT, is proposed to cluster high dimensional categorical data. The architecture of PARTCAT is based on the recently developed neural network architecture PART, and a major modification is provided in order to deal with categorical attributes. PARTCAT requires less number of parameters than PART, and in particular, PARTCAT does not need the distance parameter that is needed in PART and is intimately related to the similarity in each fixed dimension. Some simulations using real data sets to show the performance of PARTCAT are provided.

Explore More