Is this you? Create Your Porfile

Hock Hee Ang

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hock Hee Ang is active.

Explore More

Publication

Featured researches published by Hock Hee Ang.

database systems for advanced applications | 2010

Mining outliers with ensemble of heterogeneous detectors on random subspaces

Hoang Vu Nguyen; Hock Hee Ang; Vivekanand Gopalkrishnan

Outlier detection has many practical applications, especially in domains that have scope for abnormal behavior. Despite the importance of detecting outliers, defining outliers in fact is a nontrivial task which is normally application-dependent. On the other hand, detection techniques are constructed around the chosen definitions. As a consequence, available detection techniques vary significantly in terms of accuracy, performance and issues of the detection problem which they address. In this paper, we propose a unified framework for combining different outlier detection algorithms. Unlike existing work, our approach combines non-compatible techniques of different types to improve the outlier detection accuracy compared to other ensemble and individual approaches. Through extensive empirical studies, our framework is shown to be very effective in detecting outliers in the real-world context.

european conference on machine learning | 2008

Cascade RSVM in Peer-to-Peer Networks

Hock Hee Ang; Vivekanand Gopalkrishnan; Steven C. H. Hoi; Wee Keong Ng

The goal of distributed learning in P2P networks is to achieve results as close as possible to those from centralized approaches. Learning models of classification in a P2P network faces several challenges like scalability, peer dynamism, asynchronism and data privacy preservation. In this paper, we study the feasibility of building SVM classifiers in a P2P network. We show how cascading SVM can be mapped to a P2P network of data propagation. Our proposed P2P SVM provides a method for constructing classifiers in P2P networks with classification accuracy comparable to centralized classifiers and better than other distributed classifiers. The proposed algorithm also satisfies the characteristics of P2P computing and has an upper bound on the communication overhead. Extensive experimental results confirm the feasibility and attractiveness of this approach.

IEEE Transactions on Knowledge and Data Engineering | 2013

Predictive Handling of Asynchronous Concept Drifts in Distributed Environments

Hock Hee Ang; Vivekanand Gopalkrishnan; Indre Zliobaite; Mykola Pechenizkiy; Steven C. H. Hoi

In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous concept drifts better and faster than current state-of-the-art approaches, which have been designed to work in less challenging environments. In addition, PINE is parameter insensitive and incurs less communication cost while achieving better accuracy.

european conference on machine learning | 2009

Communication-Efficient Classification in P2P Networks

Hock Hee Ang; Vivekanand Gopalkrishnan; Wee Keong Ng; Steven C. H. Hoi

Distributed classification aims to learn with accuracy comparable to that of centralized approaches but at far lesser communication and computation costs. By nature, P2P networks provide an excellent environment for performing a distributed classification task due to the high availability of shared resources, such as bandwidth, storage space, and rich computational power. However, learning in P2P networks is faced with many challenging issues; viz., scalability, peer dynamism, asynchronism and fault-tolerance. In this paper, we address these challenges by presenting CEMPaR--a communication-efficient framework based on cascading SVMs that exploits the characteristics of DHT-based lookup protocols. CEMPaR is designed to be robust to parameters such as the number of peers in the network, imbalanced data sizes and class distribution while incurring extremely low communication cost yet maintaining accuracy comparable to the best-in-the-class approaches. Feasibility and effectiveness of our approach are demonstrated with extensive experimental studies on real and synthetic datasets.

european conference on machine learning | 2010

On classifying drifting concepts in P2P networks

Hock Hee Ang; Vivekanand Gopalkrishnan; Wee Keong Ng; Steven C. H. Hoi

Concept drift is a common challenge for many real-world data mining and knowledge discovery applications. Most of the existing studies for concept drift are based on centralized settings, and are often hard to adapt in a distributed computing environment. In this paper, we investigate a new research problem, P2P concept drift detection, which aims to effectively classify drifting concepts in P2P networks. We propose a novel P2P learning framework for concept drift classification, which includes both reactive and proactive approaches to classify the drifting concepts in a distributed manner. Our empirical study shows that the proposed technique is able to effectively detect the drifting concepts and improve the classification performance.

database systems for advanced applications | 2010

Adaptive ensemble classification in p2p networks

Hock Hee Ang; Vivekanand Gopalkrishnan; Steven C. H. Hoi; Wee Keong Ng

Classification in P2P networks has become an important research problem in data mining due to the popularity of P2P computing environments. This is still an open difficult research problem due to a variety of challenges, such as non-i.i.d. data distribution, skewed or disjoint class distribution, scalability, peer dynamism and asynchronism. In this paper, we present a novel P2P Adaptive Classification Ensemble (PACE) framework to perform classification in P2P networks. Unlike regular ensemble classification approaches, our new framework adapts to the test data distribution and dynamically adjusts the voting scheme by combining a subset of classifiers/peers according to the test data example. In our approach, we implement the proposed PACE solution together with the state-of-the-art linear SVM as the base classifier for scalable P2P classification. Extensive empirical studies show that the proposed PACE method is both efficient and effective in improving classification performance over regular methods under various adverse conditions.

ACM Transactions on Knowledge Discovery From Data | 2013

Classification in P2P networks with cascade support vector machines

Hock Hee Ang; Vivekanand Gopalkrishnan; Steven C. H. Hoi; Wee Keong Ng

Classification in Peer-to-Peer (P2P) networks is important to many real applications, such as distributed intrusion detection, distributed recommendation systems, and distributed antispam detection. However, it is very challenging to perform classification in P2P networks due to many practical issues, such as scalability, peer dynamism, and asynchronism. This article investigates the practical techniques of constructing Support Vector Machine (SVM) classifiers in the P2P networks. In particular, we demonstrate how to efficiently cascade SVM in a P2P network with the use of reduced SVM. In addition, we propose to fuse the concept of cascade SVM with bootstrap aggregation to effectively balance the trade-off between classification accuracy, model construction, and prediction cost. We provide theoretical insights for the proposed solutions and conduct an extensive set of empirical studies on a number of large-scale datasets. Encouraging results validate the efficacy of the proposed approach.

very large data bases | 2010

P2PDocTagger: content management through automated P2P collaborative tagging

Hock Hee Ang; Vivekanand Gopalkrishnan; Wee Keong Ng; Steven C. H. Hoi

As the amount of user generated content grows, personal information management has become a challenging problem. Several information management approaches, such as desktop search, document organization and (collaborative) document tagging have been proposed to address this, however they are either inappropriate or inefficient. Automated collaborative document tagging approaches mitigate the problems of manual tagging, but they are usually based on centralized settings which are plagued by problems such as scalability, privacy, etc. To resolve these issues, we present P2PDocTagger, an automated and distributed document tagging system based on classification in P2P networks. P2P-DocTagger minimizes the efforts of individual peers and reduces computation and communication cost while providing high tagging accuracy, and eases of document organization/retrieval. In addition, we provide a realistic and flexible simulation toolkit -- P2PDMT, to facilitate the development and testing of P2P data mining algorithms.As the amount of user generated content grows, personal information management has become a challenging problem. Several information management approaches, such as desktop search, document organization and (collaborative) document tagging have been proposed to address this, however they are either inappropriate or inefficient. Automated collaborative document tagging approaches mitigate the problems of manual tagging, but they are usually based on centralized settings which are plagued by problems such as scalability, privacy, etc. To resolve these issues, we present P2PDocTagger, an automated and distributed document tagging system based on classification in P2P networks. P2P-DocTagger minimizes the efforts of individual peers and reduces computation and communication cost while providing high tagging accuracy, and eases of document organization/retrieval. In addition, we provide a realistic and flexible simulation toolkit -- P2PDMT, to facilitate the development and testing of P2P data mining algorithms.

international conference on data mining | 2010

Distributed Classification on Peers with Variable Data Spaces and Distributions

Quach Vinh Thanh; Vivekanand Gopalkrishnan; Hock Hee Ang

The promise of distributed classification is to improve the classification accuracy of peers on their respective local data, using the knowledge of other peers in the distributed network. Though in reality, data across peers may be drastically different from each other (in the distribution of observations and/or the labels), current explorations implicitly assume that all learning agents receive data from the same distribution. We remove this simplifying assumption by allowing peers to draw from arbitrary data distributions and be based on arbitrary spaces, thus formalizing the general problem of distributed classification. We find that this problem is difficult because it does not admit state-of-the-art solutions in distributed classification. We also discuss the relation between the general problem and transfer learning, and show that transfer learning approaches cannot be trivially fitted to solve the problem. Finally, we present a list of open research problems in this challenging field.

databases, information systems, and peer-to-peer computing | 2008