Is this you? Create Your Porfile

Peilin Zhao

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peilin Zhao is active.

Explore More

Publication

Featured researches published by Peilin Zhao.

IEEE Transactions on Knowledge and Data Engineering | 2014

Online Feature Selection and Its Applications

Jialei Wang; Peilin Zhao; Steven C. H. Hoi; Rong Jin

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of online feature selection is how to make accurate prediction for an instance using a small number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: 1) learning with full input, where an learner is allowed to access all the features to decide the subset of active features, and 2) learning with partial input, where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public data sets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.

acm multimedia | 2013

Online multimodal deep similarity learning with application to image retrieval

Pengcheng Wu; Steven C. H. Hoi; Hao Xia; Peilin Zhao; Dayong Wang; Chunyan Miao

Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.

Machine Learning | 2013

Online Multiple Kernel Classification

Steven C. H. Hoi; Rong Jin; Peilin Zhao; Tianbao Yang

Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and their combination weights are correlated. The proposed algorithms are based on the fusion of two online learning algorithms, i.e., the Perceptron algorithm that learns a classifier for a given kernel, and the Hedge algorithm that combines classifiers by linear weights. We develop stochastic selection strategies that randomly select a subset of kernels for combination and model updating, thus improving the learning efficiency. Our empirical study with 15 data sets shows promising performance of the proposed algorithms for OMKC in both learning efficiency and prediction accuracy.

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining | 2012

Online feature selection for mining big data

Steven C. H. Hoi; Jialei Wang; Peilin Zhao; Rong Jin

Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features are active and can be used for prediction. We address this challenge by studying sparsity regularization and truncation techniques. Specifically, we present an effective algorithm to solve the problem, give the theoretical analysis, and evaluate the empirical performance of the proposed algorithms for online feature selection on several public datasets. We also demonstrate the application of our online feature selection technique to tackle real-world problems of big data mining, which is significantly more scalable than some well-known batch feature selection algorithms. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques for large-scale applications.

web search and data mining | 2011

Mining social images with distance metric learning for automated image tagging

Pengcheng Wu; Steven C. H. Hoi; Peilin Zhao; Ying He

With the popularity of various social media applications, massive social images associated with high quality tags have been made available in many social media web sites nowadays. Mining social images on the web has become an emerging important research topic in web search and data mining. In this paper, we propose a machine learning framework for mining social images and investigate its application to automated image tagging. To effectively discover knowledge from social images that are often associated with multimodal contents (including visual images and textual tags), we propose a novel Unified Distance Metric Learning (UDML) scheme, which not only exploits both visual and textual contents of social images, but also effectively unifies both inductive and transductive metric learning techniques in a systematic learning framework. We further develop an efficient stochastic gradient descent algorithm for solving the UDML optimization task and prove the convergence of the algorithm. By applying the proposed technique to the automated image tagging task in our experiments, we demonstrate that our technique is empirically effective and promising for mining social images towards some real applications.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Online Multiple Kernel Similarity Learning for Visual Search

Hao Xia; Steven C. H. Hoi; Rong Jin; Peilin Zhao

Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly.

IEEE Transactions on Knowledge and Data Engineering | 2014

Cost-Sensitive Online Classification

Jialei Wang; Peilin Zhao; Steven C. H. Hoi

Both cost-sensitive classification and online learning have been studied extensively in data mining and machine learning communities, respectively. It is a bit surprising that there was very limited comprehensive study for addressing an important intersecting problem, that is, cost-sensitive online classification. In this paper, we formally study this problem, and propose a new framework for cost-sensitive online classification by exploiting the idea of online gradient descent techniques. Based on the framework, we propose a family of cost-sensitive online classification algorithms, which are designed to directly optimize two well-known cost-sensitive measures: (i) maximization of weighted sum of sensitivity and specificity, and (ii) minimization of weighted misclassification cost. We analyze the theoretical bounds of the cost-sensitive measures made by the proposed algorithms, and extensively examine their empirical performance on a variety of cost-sensitive online classification tasks.

knowledge discovery and data mining | 2013

Cost-sensitive online active learning with application to malicious URL detection

Peilin Zhao; Steven C. H. Hoi

Malicious Uniform Resource Locator (URL) detection is an important problem in web search and mining, which plays a critical role in internet security. In literature, many existing studies have attempted to formulate the problem as a regular supervised binary classification task, which typically aims to optimize the prediction accuracy. However, in a real-world malicious URL detection task, the ratio between the number of malicious URLs and legitimate URLs is highly imbalanced, making it very inappropriate for simply optimizing the prediction accuracy. Besides, another key limitation of the existing work is to assume a large amount of training data is available, which is impractical as the human labeling cost could be potentially quite expensive. To solve these issues, in this paper, we present a novel framework of Cost-Sensitive Online Active Learning (CSOAL), which only queries a small fraction of training data for labeling and directly optimizes two cost-sensitive measures to address the class-imbalance issue. In particular, we propose two CSOAL algorithms and analyze their theoretical performance in terms of cost-sensitive bounds. We conduct an extensive set of experiments to examine the empirical performance of the proposed algorithms for a large-scale challenging malicious URL detection task, in which the encouraging results showed that the proposed technique by querying an extremely small-sized labeled data (about 0.5% out of 1-million instances) can achieve better or highly comparable classification performance in comparison to the state-of-the-art cost-insensitive and cost-sensitive online classification algorithms using a huge amount of labeled data.

PLOS Computational Biology | 2016

Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction

Yong Liu; Min Wu; Chunyan Miao; Peilin Zhao; Xiaoli Li

In pharmaceutical sciences, a crucial step of the drug discovery process is the identification of drug-target interactions. However, only a small portion of the drug-target interactions have been experimentally validated, as the experimental validation is laborious and costly. To improve the drug discovery efficiency, there is a great need for the development of accurate computational approaches that can predict potential drug-target interactions to direct the experimental verification. In this paper, we propose a novel drug-target interaction prediction algorithm, namely neighborhood regularized logistic matrix factorization (NRLMF). Specifically, the proposed NRLMF method focuses on modeling the probability that a drug would interact with a target by logistic matrix factorization, where the properties of drugs and targets are represented by drug-specific and target-specific latent vectors, respectively. Moreover, NRLMF assigns higher importance levels to positive observations (i.e., the observed interacting drug-target pairs) than negative observations (i.e., the unknown pairs). Because the positive observations are already experimentally verified, they are usually more trustworthy. Furthermore, the local structure of the drug-target interaction data has also been exploited via neighborhood regularization to achieve better prediction accuracy. We conducted extensive experiments over four benchmark datasets, and NRLMF demonstrated its effectiveness compared with five state-of-the-art approaches.

ACM Transactions on Knowledge Discovery From Data | 2013

Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection

Bin Li; Steven C. H. Hoi; Peilin Zhao; Vivekanand Gopalkrishnan

Online portfolio selection has been attracting increasing attention from the data mining and machine learning communities. All existing online portfolio selection strategies focus on the first order information of a portfolio vector, though the second order information may also be beneficial to a strategy. Moreover, empirical evidence shows that relative stock prices may follow the mean reversion property, which has not been fully exploited by existing strategies. This article proposes a novel online portfolio selection strategy named Confidence Weighted Mean Reversion (CWMR). Inspired by the mean reversion principle in finance and confidence weighted online learning technique in machine learning, CWMR models the portfolio vector as a Gaussian distribution, and sequentially updates the distribution by following the mean reversion trading principle. CWMR’s closed-form updates clearly reflect the mean reversion trading idea. We also present several variants of CWMR algorithms, including a CWMR mixture algorithm that is theoretical universal. Empirically, CWMR strategy is able to effectively exploit the power of mean reversion for online portfolio selection. Extensive experiments on various real markets show that the proposed strategy is superior to the state-of-the-art techniques. The experimental testbed including source codes and data sets is available online.

Explore More