Kelvin Sim
Agency for Science, Technology and Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kelvin Sim.
Data Mining and Knowledge Discovery | 2013
Kelvin Sim; Vivekanand Gopalkrishnan; Arthur Zimek; Gao Cong
Subspace clustering finds sets of objects that are homogeneous in subspaces of high-dimensional datasets, and has been successfully applied in many domains. In recent years, a new breed of subspace clustering algorithms, which we denote as enhanced subspace clustering algorithms, have been proposed to (1) handle the increasing abundance and complexity of data and to (2) improve the clustering results. In this survey, we present these enhanced approaches to subspace clustering by discussing the problems they are solving, their cluster definitions and algorithms. Besides enhanced subspace clustering, we also present the basic subspace clustering and the related works in high-dimensional clustering.
international conference on data engineering | 2007
Guimei Liu; Jinyan Li; Kelvin Sim; Limsoon Wong
Traditional similarity or distance measurements usually become meaningless when the dimensions of the datasets increase, which has detrimental effects on clustering performance. In this paper, we propose a distance-based subspace clustering model, called nCluster, to find groups of objects that have similar values on subsets of dimensions. Instead of using a grid based approach to partition the data space into non-overlapping rectangle cells as in the density based subspace clustering algorithms, the nCluster model uses a more flexible method to partition the dimensions to preserve meaningful and significant clusters. We develop an efficient algorithm to mine only maximal nClusters. A set of experiments are conducted to show the efficiency of the proposed algorithm and the effectiveness of the new model in preserving significant clusters.
international conference on data mining | 2006
Kelvin Sim; Jinyan Li; Vivekanand Gopalkrishnan; Guimei Liu
We introduce an unsupervised process to co-cluster groups of stocks and financial ratios, so that investors can gain more insight on how they are correlated. Our idea for the co-clustering is based on a graph concept called maximal quasi-bicliques, which can tolerate erroneous or/and missing information that are common in the stock and financial ratio data. Compared to previous works, our maximal quasi-bicliques require the errors to be evenly distributed, which enable us to capture more meaningful co-clusters. We develop a new algorithm that can efficiently enumerate maximal quasi-bicliques from an undirected graph. The concept of maximal quasi-bicliques is domain-independent; it can be extended to perform co-clustering on any set of data that are modeled by graphs.
international conference on data mining | 2010
Kelvin Sim; Zeyar Aung; Vivekanand Gopalkrishnan
Subspace clusters represent useful information in high-dimensional data. However, mining significant subspace clusters in continuous-valued 3D data such as stock-financial ratio-year data, or gene-sample-time data, is difficult. Firstly, typical metrics either find subspaces with very few objects, or they find too many insignificant subspaces – those which exist by chance. Besides, typical 3D subspace clustering approaches abound with parameters, which are usually set under biased assumptions, making the mining process a ‘guessing game’. We address these concerns by proposing an information theoretic measure, which allows us to identify 3D subspace clusters that stand out from the data. We also develop a highly effective, efficient and parameter-robust algorithm, which is a hybrid of information theoretical and statistical techniques, to mine these clusters. From extensive experimentations, we show that our approach can discover significant 3D subspace clusters embedded in 110 synthetic datasets of varying conditions. We also perform a case study on real-world stock datasets, which shows that our clusters can generate higher profits compared to those mined by other approaches.
data warehousing and knowledge discovery | 2006
Guimei Liu; Kelvin Sim; Jinyan Li
Many real world applications rely on the discovery of maximal biclique subgraphs (complete bipartite subgraphs). However, existing algorithms for enumerating maximal bicliques are not very efficient in practice. In this paper, we propose an efficient algorithm to mine large maximal biclique subgraphs from undirected graphs. Our algorithm uses a divide-and-conquer approach. It effectively uses the size constraints on both vertex sets to prune unpromising bicliques and to reduce the search space iteratively during the mining process. The time complexity of the proposed algorithm is O(ndN), where n is the number of vertices, d is the maximal degree of the vertices and N is the number of maximal bicliques. Our performance study shows that the proposed algorithm outperforms previous work significantly.
international conference on e-health networking, applications and services | 2010
Kelvin Sim; Ghim-Eng Yap; Clifton Phua; Jit Biswas; Aung Aung Phyo Wai; Andrei Tolstikov; Weimin Huang; Philip Yap
Using ambient intelligence to assist people with dementia in carrying out their Activities of Daily Living (ADLs) independently in smart home environment is an important research area, due to the projected increasing number of people with dementia. We present herein, a system and algorithms for the automated recognition of ADLs; the ADLs are in terms of plans made up encoded sequences of micro-context information gathered by sensors in a smart home. Previously, the Erroneous-Plan Recognition (EPR) system was developed to specifically handle the wide spectrum of micro contexts from multiple sensing modalities. The EPR system monitors the person with dementia and determines if he has executed a correct or erroneous ADL. However, due to the noisy readings of the sensing modalities, the EPR system has problems in accurately detecting the erroneous ADLs. We propose to improve the accuracy of the EPR system by two new key components. First, we model the smart home environment as a Markov decision process (MDP), with the EPR system built upon it. Simple referencing of this model allows us to filter erroneous readings of the sensing modalities. Second, we use the reinforcement learning concept of probability and reward to infer erroneous readings that are not filtered by the first key component.We conducted extensive experiments and showed that the accuracy of the new EPR system is 26.2% higher than the previous system, and is therefore a better system for ambient assistive living applications.
IEEE Transactions on Knowledge and Data Engineering | 2013
Kelvin Sim; Ghim-Eng Yap; David R. Hardoon; Vivekanand Gopalkrishnan; Gao Cong; Suryani Lukman
Actionable 3D subspace clustering from real-world continuous-valued 3D (i.e., object-attribute-context) data promises tangible benefits such as discovery of biologically significant protein residues and profitable stocks, but existing algorithms are inadequate in solving this clustering problem; most of them are not actionable (ability to suggest profitable or beneficial actions to users), do not allow incorporation of domain knowledge, and are parameter sensitive, i.e., the wrong threshold setting reduces the cluster quality. Moreover, its 3D structure complicates this clustering problem. We propose a centroid-based actionable 3D subspace clustering framework, named CATSeeker, which allows incorporation of domain knowledge, and achieves parameter insensitivity and excellent performance through a unique combination of singular value decomposition, numerical optimization, and 3D frequent itemset mining. Experimental results on synthetic, protein structural, and financial data show that CATSeeker significantly outperforms all the competing methods in terms of efficiency, parameter insensitivity, and cluster usefulness.
Information Sciences | 2011
Kelvin Sim; Guimei Liu; Vivekanand Gopalkrishnan; Jinyan Li
Stocks with similar financial ratio values across years have similar price movements. We investigate this hypothesis by clustering groups of stocks that exhibit homogeneous financial ratio values across years, and then study their price movements. We propose using cross-graph quasi-biclique (CGQB) subgraphs to cluster stocks, as they can define the three dimensional (3D) subspaces of financial ratios that the stocks are homogeneous in across the years, and they can also handle missing values that are rampant in the stock data. Furthermore, investors can easily analyze these 3D subspaces to explore the relations between the stocks and financial ratios. We develop a novel algorithm, CGQBminer, which mines the complete set of CGQB subgraphs from the stock data. Through experimental analysis, we show that the hypothesis is valid. Furthermore, we demonstrate that having an investment strategy which uses groups of stocks mined by CGQB subgraphs have higher returns than one that does not. We also conducted an extensive performance analysis on CGQBminer, and show that it is efficient across different 3D datasets and parameter settings.
asia pacific bioinformatics conference | 2007
Suryani Lukman; Kelvin Sim; Jinyan Li; Yi-Ping Phoebe Chen
To assess the physico-chemical characteristics of protein-protein interactions, protein sequences and overall structural folds have been analyzed previously. To highlight this, discovery and examination of amino acid patterns at the binding sites defined by structural proximity in 3-dimensional (3D) space are essential. In this paper, we investigate the interacting preferences of 3D pattern pairs discovered separately in transient and obligate protein complexes. These 3D pattern pairs are not necessarily sequence-consecutive, but each residue in two groups of amino acids from two proteins in a complex is within certain °A threshold to most residues in the other group. We develop an algorithm called AA-pairs by which every pair of interacting proteins is represented as a bipartite graph, and it discovers all maximal quasi-bicliques from every bipartite graph to form our 3D pattern pairs. From 112 and 2533 highly conserved 3D pattern pairs discovered in the transient and obligate complexes respectively, we observe that Ala and Leu is the highest occuring amino acid in interacting 3D patterns of transient (20.91%) and obligate (33.82%) complexes respectively. From the study on the dipeptide composition on each side of interacting 3D pattern pairs, dipeptides Ala-Ala and Ala-Leu are popular in 3D patterns of both transient and obligate complexes. The interactions between amino acids with large hydrophobicity difference are present more in the transient than in the obligate complexes. On contrary, in obligate complexes, interactions between hydrophobic residues account for the top 5 most occuring amino acid pairings.
international conference on smart homes and health telematics | 2012
Vwen Yen Lee; Yan Liu; Xian Zhang; Clifton Phua; Kelvin Sim; Jiaqi Zhu; Jit Biswas; Jin Song Dong; Mounir Mokhtari
Activity recognition within ambient environments is a highly non-trivial process. Such procedures can be managed using rule based systems in monitoring human behavior. However, designing and verification of such systems is laborious and time-consuming. We present a rule verification system that uses model checking techniques to ensure rule validity. This system also performs correction of erroneous rules automatically, therefore reducing reliance on manual rule checking, verification and correction.