William-Chandra Tjhi
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William-Chandra Tjhi.
Pattern Recognition | 2007
William-Chandra Tjhi; Lihui Chen
In this paper we propose a new co-clustering algorithm called possibilistic fuzzy co-clustering (PFCC) for automatic categorization of large document collections. PFCC integrates a possibilistic document clustering technique and a combined formulation of fuzzy word ranking and partitioning into a fast iterative co-clustering procedure. This novel framework brings about simultaneously some benefits including robustness in the presence of document and word outliers, rich representations of co-clusters, highly descriptive document clusters, a good performance in a high-dimensional space, and a reduced sensitivity to the initialization in the possibilistic clustering. We present the detailed formulation of PFCC together with the explanations of the motivations behind. The advantages over other existing works and the algorithms proof of convergence are provided. Experiments on several large document data sets demonstrate the effectiveness of PFCC.
Fuzzy Sets and Systems | 2008
William-Chandra Tjhi; Lihui Chen
Fuzzy co-clustering is a technique that performs simultaneous fuzzy clustering of objects and features. It is known to be suitable for categorizing high-dimensional data, due to its dynamic dimensionality reduction mechanism achieved through simultaneous feature clustering. We introduce a new fuzzy co-clustering algorithm called Heuristic Fuzzy Co-clustering with the Ruspinis condition (HFCR), which addresses several issues in some prominent existing fuzzy co-clustering algorithms. Among these issues are the performance on data sets with overlapping feature clusters and the unnatural representation of feature clusters. The key idea behind HFCR is the formulation of the dual-partitioning approach for fuzzy co-clustering, replacing the existing partitioning-ranking approach. HFCR adopts an efficient and practical heuristic method that can be shown to be more robust than our earlier effort for the dual-partitioning approach. We explain the proposed algorithm in details and provide an analytical study on its advantages. Experimental results on 10 large benchmark document data sets confirm the effectiveness of the new algorithm.
Pattern Recognition Letters | 2006
William-Chandra Tjhi; Lihui Chen
In this paper, a new algorithm fuzzy co-clustering with Ruspinis condition (FCR) is proposed for co-clustering documents and words. Compared to most existing fuzzy co-clustering algorithms, FCR is able to generate fuzzy word clusters that capture the natural distribution of words, which may be beneficial for information retrieval. We discuss the principle behind the algorithm through some theoretical discussions and illustrations. These, together with experiments on two standard datasets show that FCR can discover the naturally existing document-word co-clusters.
IEEE Transactions on Fuzzy Systems | 2009
William-Chandra Tjhi; Lihui Chen
In this paper, we develop a new soft model dual fuzzy-possibilistic coclustering (DFPC) for document categorization. The proposed model targets robustness to outliers and richer representations of coclusters. DFPC is inspired by an existing algorithm called possibilistic fuzzy C-means (PFCM) that hybridizes fuzzy and possibilistic clustering. It has been shown that PFCM can perform effectively for low-dimensional data clustering. To achieve our goal, we expand this existing idea by introducing a novel PFCM-like coclustering model. The new algorithm DFPC preserves the desired properties of PFCM. In addition, as a coclustering algorithm, DFPC is more suitable for our intended high-dimensional application: document clustering. Besides, the coclustering mechanism enables DFPC to generate, together with document clusters, fuzzy-possibilistic word memberships. These word memberships, which are absent in the existing PFCM model, can play an important role in generating useful descriptions of document clusters. We detail the formulation of the proposed model and provide an extensive analytical study of the algorithm DFPC. Experiments on an artificial dataset and various benchmark document datasets demonstrate the effectiveness and potential of DFPC.
Knowledge and Information Systems | 2013
Yang Yan; Lihui Chen; William-Chandra Tjhi
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.
cyberworlds | 2005
William-Chandra Tjhi; Lihui Chen
The Web is the largest information repository in the history of mankind. Due to its huge size however, finding relevant information without any appropriate tool can be virtually impossible. Web document clustering is one possible technique to improve the efficiency in information finding process. In this paper, we are looking into fuzzy co-clustering, which is known to be robust for clustering standard text documents. In our opinion, its robustness can also be extended to Web documents because it can generate descriptive clusters in high dimension and it is able to discover data clusters with overlaps. We consider two existing fuzzy co-clustering algorithms, FCCM and fuzzy Codok. In addition, we propose a new algorithm, FCC-STF, as an alternative to the existing ones. Empirical study of these algorithms on benchmark datasets is presented, together with the performance comparison with a standard fuzzy clustering algorithm HFCM. The results show that fuzzy co-clustering is generally superior to standard fuzzy clustering in the Web environment, making it a technique with great potential to assist Internet user in discovering relevant information effectively
international conference on control, automation, robotics and vision | 2006
William-Chandra Tjhi; Lihui Chen
Fuzzy co-clustering is an unsupervised technique that performs simultaneous fuzzy clustering of objects and features. In this paper, we propose a new flexible fuzzy co-clustering algorithm which incorporates feature-cluster weighting in the formulation. We call it Flexible Fuzzy Co-clustering with Feature-cluster Weighting (FFCFW). By flexible we mean the algorithm allows the number of object clusters to be different from the number of feature clusters. There are two motivations behind this work. First, in the fuzzy framework, many co-clustering algorithms still require the number of object clusters to be the same as the number of feature clusters. This is despite the fact that such rigid structure is hardly found in real-world applications. The second motivation is that while there have been numerous attempts for flexible co-clustering, it is common that in such scheme the relationships between object and feature clusters are not clearly represented. For this reason we incorporate a feature-cluster weighting scheme for each object cluster generated by FFCFW so that the relationships between the two types of clusters are manifested in the feature-cluster weights. This enables the new algorithm to generate more accurate representation of fuzzy co-clusters. FFCFW is formulated by fusing together the core components of two existing algorithms. Like its predecessors, FFCFW adopts an iterative optimization procedure. We discuss in details the derivation of the proposed algorithm and the advantages it has over other existing works. Experiments on several large benchmark document datasets reveal the feasibility of our proposed algorithm
international conference on signal processing | 2007
William-Chandra Tjhi; Lihui Chen
Co-clustering is a simultaneous clustering of objects and its features, and is known to be effective for categorization of high-dimensional data. Fuzzy co-clustering is co-clustering in which the resulting co-clusters are represented by fuzzy sets. We introduce a new robust fuzzy co-clustering algorithm called robust fuzzy co-clustering (RFCC). Existing prominent fuzzy co-clustering algorithms rely solely on an fuzzy C-means-like fuzzy object membership, which is known to be vulnerable to outliers. In RFCC, we propose to incorporate an additional and more robust type of fuzzy object membership to reduce the sensitivity of fuzzy co-clustering to outliers. In this paper, we detail the formulation of RFCC and demonstrate its effectiveness through an experiment on an artificial dataset.
advanced data mining and applications | 2006
William-Chandra Tjhi; Lihui Chen
Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.
international conference on service oriented computing | 2011
Henry Kasim; Terence Hung; Xiaorong Li; William-Chandra Tjhi; Sifei Lu; Long Wang
The concept of collaborative analytics is to accommodate reuse and collaboration in data analysis process through sharing of analytics methods, algorithms, and computation resources. However, realizing collaborative analytics is challenging due to the large data sets, high throughput and computational intensive requirements. In this demonstration, we present a cloud-based workflow management solution that allows collaborative analytics to run in the cloud computing environment. Our solution provides sharing of analytics resources, recommendation of analytic workflows, dynamic scheduling and provisioning for scalable data analytics, high availability through fault-tolerance, real-time monitoring and tracking of collaborative analytics status. Examples of a generic data mining analysis and climate change analytics are given to show that our work can be applied for a wide variety of study in the real-life world.