Guoliang He | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoliang He is active.

Explore More

Publication

Featured researches published by Guoliang He.

Neurocomputing | 2015

Early classification on multivariate time series

Guoliang He; Yong Duan; Rong Peng; Xiao-Yuan Jing; Tieyun Qian; Lingling Wang

Multivariate time series (MTS) classification is an important topic in time series data mining, and has attracted great interest in recent years. However, early classification on MTS data largely remains a challenging problem. To address this problem without sacrificing the classification performance, we focus on discovering hidden knowledge from the data for early classification in an explainable way. At first, we introduce a method MCFEC (Mining Core Feature for Early Classification) to obtain distinctive and early shapelets as core features of each variable independently. Then, two methods are introduced for early classification on MTS based on core features. Experimental results on both synthetic and real-world datasets clearly show that our proposed methods can achieve effective early classification on MTS.

Neurocomputing | 2016

Tri-Training for authorship attribution with limited training data

Tieyun Qian; Bing Liu; Li Chen; Zhiyong Peng; Ming Zhong; Guoliang He; Xuhui Li; Gang Xu

conference on information and knowledge management | 2013

Early prediction on imbalanced multivariate time series

Guoliang He; Yong Duan; Tieyun Qian; Xu Chen

Multivariate time series (MTS) classification is an important topic in time series data mining, and lots of efficient models and techniques have been introduced to cope with it. However, early classification on imbalanced MTS data largely remains an open problem. To deal with this issue, we adopt a multiple under-sampling and dynamical subspace generation method to obtain initial training data, and each training data is used to learn a base learner. Finally, an ensemble classifier is introduced for early classification on imbalanced MTS data. Experimental results show that our proposed methods can achieve effective early prediction on imbalanced MTS data.

Neurocomputing | 2016

Probabilistic skyline queries on uncertain time series

Guoliang He; Lu Chen; Chen Zeng; Qiaoxian Zheng; Guofu Zhou

Abstract The uncertainty of data is popular and inherent in most applications. Although skyline queries on time series in the interval has attracted great interest in recent years, skyline queries on uncertain time series remains an open problem so far. To handle this issue, we model the skyline queries on uncertain time series, and develop a two-step procedure to answer the probabilistic skyline queries on the dataset. First, three effective pruning techniques are proposed to obtain the skyline in the interval. Next, two simple methods are proposed to compute the skyline probability of each uncertain time series. For the online skyline queries, we also introduce a solution to improve the efficiency of pruning strategies by sharing the computation for two adjacent intervals. Experiments verify the effectiveness of probabilistic skylines and the efficiency and scalability of our algorithms.

international acm sigir conference on research and development in information retrieval | 2014

Co-training on authorship attribution with very fewlabeled examples: methods vs. views

Tieyun Qian; Bing Liu; Ming Zhong; Guoliang He

Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is hard or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel two-view co-training framework to iteratively identify the authors of a few unlabeled data to augment the training set. The key idea is to first represent each document as several distinct views, and then a co-training technique is adopted to exploit the large amount of unlabeled documents. Starting from 10 training texts per author, we systematically evaluate the effectiveness of co-training for authorship attribution with limited labeled data. Two methods and three views are investigated: logistic regression (LR) and support vector machines (SVM) methods, and character, lexical, and syntactic views. The experimental results show that LR is particularly effective for improving co-training in AA, and the lexical view performs the best among three views when combined with a LR classifier. Furthermore, the co-training framework does not make much difference between one classifier from two views and two classifiers from one view. Instead, it is the learning approach and the view that plays a critical role.

international conference on tools with artificial intelligence | 2015

Active Learning for Multivariate Time Series Classification with Positive Unlabeled Data

Guoliang He; Yong Duan; Yifei Li; Tieyun Qian; Jinrong He; Xiangyang Jia

Traditional time series classification problem with supervised learning algorithm needs a large set of labeled training data. In reality, the number of labeled data is often smaller and there is huge number of unlabeled data. However, manually labeling these unlabeled examples is time-consuming and expensive, and sometimes it is even impossible. Although some semi-supervised and active learning methods were proposed to handle univariate time series data, few work have touched positive and unlabeled data for multivariate time series (MTS) classification due to the data being more complex. In this paper we focus on active learning for multivariate time series classification with positive unlabeled data. First, we propose a sample selection strategy to find the most informative unlabeled examples for manual labeling. Second, we introduce two active learning approaches to obtain a high-confident training dataset for classification. Experiments on real datasets demonstrate the validity of our proposed approaches.

advanced data mining and applications | 2013

Detecting Professional Spam Reviewers

Junlong Huang; Tieyun Qian; Guoliang He; Ming Zhong; Qingxi Peng

Spam reviewers are becoming more professional. The common approach in spam reviewer detection is mainly based on the similarities among reviews or ratings on the same products. Applying this approach to professional spammer detection has some difficulties. First, some of the review systems start to set some limitations, e.g., duplicate submissions from a same id on one product are forbidden. Second, the professional spammers also greatly improve their writing skills. They are consciously trying to use diverse expressions in reviews. In this paper, we present a novel model for detecting professional spam reviewers, which combines posting frequency and text sentiment strength by analyzing the writing and behavior styles. Specifically, we first introduce an approach for counting posting frequency based on a sliding window. We then evaluate the sentiment strength by calculating the sentimental words in the text. Finally, we present a linear combination model. Experimental results on a real dataset from Dianping.com demonstrate the effectiveness of the proposed method.

web age information management | 2014

Authorship Attribution with Very Few Labeled Data: A Co-training Approach

Mengdi Fan; Tieyun Qian; Li Chen; Bin Liu; Ming Zhong; Guoliang He

Authorship attribution refers to the task of identifying the authors of a set of documents. Early studies in this area either used book length texts or assumed that there were a large number of training documents. The focus of modern authorship attribution has been shifted to the analysis on small online texts. This is realistic since in the real life it is hard to collect the training texts. However, the small size of training data makes the authorship attribution much more difficult. In this paper, we present a novel co-training method to iteratively recognize a few unlabeled data to augment the training set. Specifically, each document is first partitioned into two distinct views, i.e., lexical and syntactic view. And then, a two view semi-supervised method, co-training, is adopted to exploit the large amount of unlabeled documents. Our experiment results based on real data show that the proposed method can effectively exploit unlabeled data to improve the classification performance.

database and expert systems applications | 2016

A Reverse Nearest Neighbor Based Active Semi-supervised Learning Method for Multivariate Time Series Classification

Yifei Li; Guoliang He; Xuewen Xia; Yuanxiang Li

Time series widely exist in many areas. In reality, the number of labeled time series data is often small and there is a huge number of unlabeled data. Manually labeling these unlabeled examples is time-consuming and expensive, and sometimes it is even impossible. To reduce manual cost and obtain high confident labeled training data for multivariate time series classification, in this paper a reverse nearest neighbor based active semi-supervised learning method is proposed. First, based on information entropy and distribution density of the training data, a sampling strategy is introduced to select the most informative examples for manual annotation. Second, in terms of the newly labeled example by experts, a reverse nearest neighbor based semi-supervised learning method is presented to automatically and accurately label some confident examples. We evaluate our work with a comprehensive set of experiments on diverse multivariate time series data. Experimental results show that our approach can obtain a confident labeled training data with less manual cost.

database and expert systems applications | 2014

Early Classification on Multivariate Time Series with Core Features

Guoliang He; Yong Duan; Guofu Zhou; Lingling Wang

Multivariate time series (MTS) classification is an important topic in time series data mining, and has attracted great interest in recent years. However, early classification on MTS data largely remains a challenging problem. To address this problem, we focus on discovering hidden knowledge from the data for early classification in an explainable way. At first, we introduce a method MCFEC (Mining Core Feature for Early Classification) to obtain distinctive and early shapelets as core features of each variable independently. Then, two methods are introduced for early classification on MTS based on core features. Experimental results on both synthetic and real-world datasets clearly show that our proposed methods can achieve effective early classification on MTS.

Explore More