Swee Chuan Tan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swee Chuan Tan is active.

Explore More

Publication

Featured researches published by Swee Chuan Tan.

international joint conference on artificial intelligence | 2011

Fast anomaly detection for streaming data

Swee Chuan Tan; Kai Ming Ting; Tony Fei Liu

This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous data are rare. The model features an ensemble of random HS-Trees, and the tree structure is constructed without any data. This makes the method highly efficient because it requires no model restructuring when adapting to evolving data streams. Our analysis shows that Streaming HS-Trees has constant amortised time complexity and constant memory requirement. When compared with a state-of-the-art method, our method performs favourably in terms of detection accuracy and runtime performance. Our experimental results also show that the detection performance of Streaming HS-Trees is not sensitive to its parameter settings.

Machine Learning | 2011

Feature-subspace aggregating: ensembles for stable and unstable learners

Kai Ming Ting; Jonathan R. Wells; Swee Chuan Tan; Shyh Wei Teng; Geoffrey I. Webb

This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.

Machine Learning archive | 2013

Mass estimation

Kai Ming Ting; Guang-Tong Zhou; Fei Tony Liu; Swee Chuan Tan

This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as well as and often better than eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.

Pattern Recognition | 2011

A general stochastic clustering method for automatic cluster discovery

Swee Chuan Tan; Kai Ming Ting; Shyh Wei Teng

Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run.

ieee international conference on evolutionary computation | 2006

Reproducing the Results of Ant-based Clustering Without Using Ants

Swee Chuan Tan; Kai Ming Ting; Shyh Wei Teng

In this paper, we remove the ant-metaphor from ant-based clustering using a randomised partitioning method followed by an agglomerative clustering procedure. While our model only adopts part of the ant-based heuristics, it has produced results that are comparable to the ant-based model. Our approach is based on the fact that one ant can produce the same results as many ants in the models that we have studied, and these models function like stochastic sampling algorithms. In addition, we introduce a schedule to terminate the clustering process before the maximum number of iterations has been reached. We also improve the runtime stability of our model with respect to changes in the structures of the data sets.

DaEng | 2014

Time Series Clustering: A Superior Alternative for Market Basket Analysis

Swee Chuan Tan; Jess Pei San Lau

Market Basket Analysis often involves applying the de facto association rule mining method on massive sales transaction data. In this paper, we argue that association rule mining is not always the most suitable method for analysing big market-basket data. This is because the data matrix to be used for association rule mining is usually large and sparse, resulting in sluggish generation of many trivial rules with little insight. To address this problem, we summarise a real-world sales transaction data set into time series format. We then use time series clustering to discover commonly purchased items that are useful for pricing or formulating cross-selling strategies. We show that this approach uses a data set that is substantially smaller than the data to be used for association analysis. In addition, it reveals significant patterns and insights that are otherwise hard to uncover when using association analysis.

australian conference on artificial life | 2007

Examining dissimilarity scaling in ant colony approaches to data clustering

Swee Chuan Tan; Kai Ming Ting; Shyh Wei Teng

In this paper, we provide the reasons why the dissimilarity-scaling parameter (α) in the neighbourhood function of ant-based clustering is critical for detecting the correct number of clusters in data sources. We then examine a recently proposed method named ATTA; we show that there is no need to use a population of α-adaptive ants to reproduce ATTAs results. We devise a method to estimate a fixed (i.e, non-adaptive) single value of α for each dataset. We also introduce a simplified version of ATTA, called SATTA. The reason for introducing SATTA is two-fold: first, to test our proposed α-estimation method; and, second, to simulate ant-based clustering from a purely stochastic perspective. SATTA omits the ant colony but reuses important ant heuristics. Experimental results show that SATTA generally performs better than ATTA on clusters with different densities and clusters that are elongated. Finally, we show that the results can be further improved using a majority voting scheme.

international conference on conceptual structures | 2011

Simplifying and improving ant-based clustering

Swee Chuan Tan; Kai Ming Ting; Shyh Wei Teng

Abstract Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system.

international conference industrial, engineering & other applications applied intelligent systems | 2015

Finding Similar Time Series in Sales Transaction Data

Swee Chuan Tan; Pei San Lau; XiaoWei Yu

This paper studies the problem of finding similar time series of product sales in transactional data. We argue that finding such similar time series can lead to discovery of interesting and actionable business information such as previously unknown complementary products or substitutes, and hidden supply chain information. However, finding all possible pairs of n time series exhaustively results in On2 time complexity. To address this issue, we propose using k-means clustering method to create small clusters of similar time series, and those clusters with very small intra-cluster variability are used to find similar time series. Finally, we demonstrate the utility of our approach to derive interesting results from real-life data.

congress on evolutionary computation | 2012

Simplifying and improving swarm-based clustering

Swee Chuan Tan

Swarm-based clustering has enthused researchers for its ability to find clusters in datasets automatically, and without requiring users to specify the number of clusters. While conventional wisdom suggests that swarm intelligence contributes to this ability, recent works have provided alternative explanation about underlying stochastic heuristics that are really at work. This paper shows that the working principles of several recent SBC methods can be explained using a stochastic clustering framework that is unrelated to swarm intelligence. The framework is theoretically simple and in practice easy to implement. We also incorporate a mechanism to calibrate a key parameter so as to enhance the clustering performance. Despite the simplicity of the enhanced algorithm, experimental results show that it outperforms two recent SBC methods in terms of clustering accuracy and efficiency in the majority of the datasets used in this study.

Explore More