Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kai Ming Ting is active.

Publication


Featured researches published by Kai Ming Ting.


IEEE Transactions on Knowledge and Data Engineering | 2002

An instance-weighting method to induce cost-sensitive trees

Kai Ming Ting

We introduce an instance-weighting method to induce cost-sensitive trees. It is a generalization of the standard tree induction process where only the initial instance weights determine the type of tree to be induced-minimum error trees or minimum high cost error trees. We demonstrate that it can be easily adapted to an existing tree learning algorithm. Previous research provides insufficient evidence to support the idea that the greedy divide-and-conquer algorithm can effectively induce a truly cost-sensitive tree directly from the training data. We provide this empirical evidence in this paper. The algorithm incorporating the instance-weighting method is found to be better than the original algorithm in in of total misclassification costs, the number of high cost errors, and tree size two-class data sets. The instance-weighting method is simpler and more effective in implementation than a previous method based on altered priors.


international conference on data mining | 2008

Isolation Forest

Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou

Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.


IEEE Transactions on Multimedia | 2011

A Survey of Audio-Based Music Classification and Annotation

Zhouyu Fu; Guojun Lu; Kai Ming Ting; Dengsheng Zhang

Music information retrieval (MIR) is an emerging research area that receives growing attention from both the research community and music industry. It addresses the problem of querying and retrieving certain types of music from large music data set. Classification is a fundamental problem in MIR. Many tasks in MIR can be naturally cast in a classification setting, such as genre classification, mood classification, artist recognition, instrument recognition, etc. Music annotation, a new research area in MIR that has attracted much attention in recent years, is also a classification problem in the general sense. Due to the importance of music classification in MIR research, rapid development of new methods, and lack of review papers on recent progress of the field, we provide a comprehensive review on audio-based classification in this paper and systematically summarize the state-of-the-art techniques for music classification. Specifically, we have stressed the difference in the features and the types of classifiers used for different classification tasks. This survey emphasizes on recent development of the techniques and discusses several open issues for future research.


ACM Transactions on Knowledge Discovery From Data | 2012

Isolation-Based Anomaly Detection

Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou

Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.


european conference on principles of data mining and knowledge discovery | 1998

Inducing Cost-Sensitive Trees via Instance Weighting

Kai Ming Ting

We introduce an instance-weighting method to induce costsensitive trees in this paper. It is a generalization of the standard tree induction process where only the initial instance weights determine the type of tree to be induced—minimum error trees or minimum high cost error trees. We demonstrate that it can be easily adapted to an existing tree learning algorithm. Previous research gave insufficient evidence to support the fact that the greedy divide-and-conquer algorithm can effectively induce a truly cost-sensitive tree directly from the training data. We provide this empirical evidence in this paper. The algorithm incorporating the instance-weighting method is found to be better than the original algorithm in terms of total misclassification costs, the number of high cost errors and tree size in two-class datasets. The instanceweighting method is simpler and more effective in implementation than a previous method based on altered priors.


australian joint conference on artificial intelligence | 2006

z-SVM: an SVM for improved classification of imbalanced data

Tasadduq Imam; Kai Ming Ting; Joarder Kamruzzaman

Recent literature has revealed that the decision boundary of a Support Vector Machine (SVM) classifier skews towards the minority class for imbalanced data, resulting in high misclassification rate for minority samples. In this paper, we present a novel strategy for SVM in class imbalanced scenario. In particular, we focus on orienting the trained decision boundary of SVM so that a good margin between the decision boundary and each of the classes is maintained, and also classification performance is improved for imbalanced data. In contrast to existing strategies that introduce additional parameters, the values of which are determined through empirical search involving multiple SVM training, our strategy corrects the skew of the learned SVM model automatically irrespective of the choice of learning parameters without multiple SVM training. We compare our strategy with SVM and SMOTE, a widely accepted strategy for imbalanced data, applied to SVM on five well known imbalanced datasets. Our strategy demonstrates improved classification performance for imbalanced data and is less sensitive to the selection of SVM learning parameters.


Machine Learning | 2005

On the application of ROC analysis to predict classification performance under varying class distributions

Geoffrey I. Webb; Kai Ming Ting

We counsel caution in the application of ROC analysis for prediction of classifier performance under varying class distributions. We argue that it is not reasonable to expect ROC analysis to provide accurate prediction of model performance under varying distributions if the classes contain causally relevant subclasses whose frequencies may vary at different rates or if there are attributes upon which the classes are causally dependent.


Machine Learning | 2012

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Geoffrey I. Webb; Janice R. Boughton; Fei Zheng; Kai Ming Ting; Houssam Salem

Averaged n-Dependence Estimators (AnDE) is an approach to probabilistic classification learning that learns by extrapolation from marginal to full-multivariate probability distributions. It utilizes a single parameter that transforms the approach between a low-variance high-bias learner (Naive Bayes) and a high-variance low-bias learner with Bayes optimal asymptotic error. It extends the underlying strategy of Averaged One-Dependence Estimators (AODE), which relaxes the Naive Bayes independence assumption while retaining many of Naive Bayes’ desirable computational and theoretical properties. AnDE further relaxes the independence assumption by generalizing AODE to higher-levels of dependence. Extensive experimental evaluation shows that the bias-variance trade-off for Averaged 2-Dependence Estimators results in strong predictive accuracy over a wide range of data sets. It has training time linear with respect to the number of examples, learns in a single pass through the training data, supports incremental learning, handles directly missing values, and is robust in the face of noise. Beyond the practical utility of its lower-dimensional variants, AnDE is of interest in that it demonstrates that it is possible to create low-bias high-variance generative learners and suggests strategies for developing even more powerful classifiers.


international joint conference on artificial intelligence | 2011

Fast anomaly detection for streaming data

Swee Chuan Tan; Kai Ming Ting; Tony Fei Liu

This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous data are rare. The model features an ensemble of random HS-Trees, and the tree structure is constructed without any data. This makes the method highly efficient because it requires no model restructuring when adapting to evolving data streams. Our analysis shows that Streaming HS-Trees has constant amortised time complexity and constant memory requirement. When compared with a state-of-the-art method, our method performs favourably in terms of detection accuracy and runtime performance. Our experimental results also show that the detection performance of Streaming HS-Trees is not sensitive to its parameter settings.


Machine Learning | 2007

Classifying under computational resource constraints: anytime classification using probabilistic estimators

Ying Yang; Geoffrey I. Webb; Kevin B. Korb; Kai Ming Ting

Abstract In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.

Collaboration


Dive into the Kai Ming Ting's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge