Fei Tony Liu
Monash University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fei Tony Liu.
ACM Transactions on Knowledge Discovery From Data | 2012
Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou
Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.
knowledge discovery and data mining | 2010
Kai Ming Ting; Guang-Tong Zhou; Fei Tony Liu; James Swee Chuan Tan
This paper introduces mass estimation--a base modelling mechanism in data mining. It provides the theoretical basis of mass and an efficient method to estimate mass. We show that it solves problems very effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as good as and often better than a total of eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.
european conference on machine learning | 2010
Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou
Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions--anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.
Pattern Recognition | 2012
Guang-Tong Zhou; Kai Ming Ting; Fei Tony Liu; Yilong Yin
This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to a profile of the targeted multimedia database. We show that the task of CBMIR can be done more effectively using the relevance features than the original features. Furthermore, additional performance gain is achieved by incorporating our new ranking scheme which modifies instance rankings based on the weighted average of relevance feature values. Experiments on image and music databases validate the efficacy and efficiency of the proposed framework.
knowledge discovery and data mining | 2005
Fei Tony Liu; Kai Ming Ting; Wei Fan
One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different randomization methods and find that complete-random test selection produces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well accepted practice in constructing decision trees is to apply bootstrap sampling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most accurate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble.
Machine Learning archive | 2013
Kai Ming Ting; Guang-Tong Zhou; Fei Tony Liu; Swee Chuan Tan
This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as well as and often better than eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.
Knowledge and Information Systems | 2013
Kai Ming Ting; Takashi Washio; Jonathan R. Wells; Fei Tony Liu; Sunil Aryal
Density estimation is the ubiquitous base modelling mechanism employed for many tasks including clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and
international conference on data mining | 2014
Tharindu Rukshan Bandaragoda; Kai Ming Ting; David W. Albrecht; Fei Tony Liu; Jonathan R. Wells
Proceedings of the Tenth International Workshop on Multimedia Data Mining | 2010
Guang-Tong Zhou; Kai Ming Ting; Fei Tony Liu; Yilong Yin
k
knowledge discovery and data mining | 2006
Fei Tony Liu; Kai Ming Ting