Fei Tony Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fei Tony Liu is active.

Explore More

Publication

Featured researches published by Fei Tony Liu.

ACM Transactions on Knowledge Discovery From Data | 2012

Isolation-Based Anomaly Detection

Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou

Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.

knowledge discovery and data mining | 2010

Mass estimation and its applications

Kai Ming Ting; Guang-Tong Zhou; Fei Tony Liu; James Swee Chuan Tan

This paper introduces mass estimation--a base modelling mechanism in data mining. It provides the theoretical basis of mass and an efficient method to estimate mass. We show that it solves problems very effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as good as and often better than a total of eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.

european conference on machine learning | 2010

On detecting clustered anomalies using SCiForest

Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou

Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions--anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

Pattern Recognition | 2012

Relevance feature mapping for content-based multimedia information retrieval

Guang-Tong Zhou; Kai Ming Ting; Fei Tony Liu; Yilong Yin

This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to a profile of the targeted multimedia database. We show that the task of CBMIR can be done more effectively using the relevance features than the original features. Furthermore, additional performance gain is achieved by incorporating our new ranking scheme which modifies instance rankings based on the weighted average of relevance feature values. Experiments on image and music databases validate the efficacy and efficiency of the proposed framework.

knowledge discovery and data mining | 2005

Maximizing tree diversity by building complete-random decision trees

Fei Tony Liu; Kai Ming Ting; Wei Fan

One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different randomization methods and find that complete-random test selection produces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well accepted practice in constructing decision trees is to apply bootstrap sampling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most accurate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble.

Machine Learning archive | 2013

Mass estimation

Kai Ming Ting; Guang-Tong Zhou; Fei Tony Liu; Swee Chuan Tan

This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as well as and often better than eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.

Knowledge and Information Systems | 2013

DEMass: a new density estimator for big data

Kai Ming Ting; Takashi Washio; Jonathan R. Wells; Fei Tony Liu; Sunil Aryal

Density estimation is the ubiquitous base modelling mechanism employed for many tasks including clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and

international conference on data mining | 2014