Guanhua Tian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guanhua Tian is active.

Explore More

Publication

Featured researches published by Guanhua Tian.

Neurocomputing | 2016

Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification

Peng Wang; Bo Xu; Jiaming Xu; Guanhua Tian; Cheng-Lin Liu; Hong-Wei Hao

Text classification can help users to effectively handle and exploit useful information hidden in large-scale documents. However, the sparsity of data and the semantic sensitivity to context often hinder the classification performance of short texts. In order to overcome the weakness, we propose a unified framework to expand short texts based on word embedding clustering and convolutional neural network (CNN). Empirically, the semantically related words are usually close to each other in embedding spaces. Thus, we first discover semantic cliques via fast clustering. Then, by using additive composition over word embeddings from context with variable window width, the representations of multi-scale semantic units11Semantic units are defined as n-grams which have dominant meaning of text. With n varying, multi-scale contextual information can be exploited. in short texts are computed. In embedding spaces, the restricted nearest word embeddings (NWEs)22In order to prevent outliers, a Euclidean distance threshold is preset between semantic cliques and semantic units, which is used as restricted condition. of the semantic units are chosen to constitute expanded matrices, where the semantic cliques are used as supervision information. Finally, for a short text, the projected matrix33The projected matrix is obtained by table looking up, which encodes Unigram level features. and expanded matrices are combined and fed into CNN in parallel. Experimental results on two open benchmarks validate the effectiveness of the proposed method.

north american chapter of the association for computational linguistics | 2015

Short Text Clustering via Convolutional Neural Networks

Jiaming Xu; peng wang; Guanhua Tian; Bo Xu; Jun Zhao; Fangyuan Wang; Hongwei Hao

Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a localitypreserving constraint. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.

Neural Networks | 2017

Self-Taught convolutional neural networks for short text clustering

Jiaming Xu; Bo Xu; Peng Wang; Suncong Zheng; Guanhua Tian; Jun Zhao

Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw text features are firstly embedded into compact binary codes by using one existing unsupervised dimensionality reduction method. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, meanwhile the output units are used to fit the pre-trained binary codes in the training process. Finally, we get the optimal clusters by employing K-means to cluster the learned representations. Extensive experimental results demonstrate that the proposed framework is effective, flexible and outperform several popular clustering methods when tested on three public short text datasets.

conference on information and knowledge management | 2015

Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances

Zhuoyu Wei; Jun Zhao; Kang Liu; Zhenyu Qi; Zhengya Sun; Guanhua Tian

Constructing large-scale knowledge bases has attracted much attention in recent years, for which Knowledge Base Completion (KBC) is a key technique. In general, inferring new facts in a large-scale knowledge base is not a trivial task. The large number of inferred candidate facts has resulted in the failure of the majority of previous approaches. Inference approaches can achieve high precision for formulas that are accurate, but they are required to infer candidate instances one by one, and extremely large candidate sets bog them down in expensive calculations. In contrast, the existing embedding-based methods can easily calculate similarity-based scores for each candidate instance as opposed to using inference, so they can handle large-scale data. However, this type of method does not consider explicit logical semantics and usually has unsatisfactory precision. To resolve the limitations of the above two types of methods, we propose an approach through Inferring via Grounding Network Sampling over Selected Instances. We first employ an embedding-based model to make the instance selection and generate much smaller candidate sets for subsequent fact inference, which not only narrows the candidate sets but also filters out part of the noise instances. Then, we only make inferences within these candidate sets by running a data-driven inference algorithm on the Markov Logic Network (MLN), which is called Inferring via Grounding Network Sampling (INS). In this process, we especially incorporate the similarity priori generated by embedding-based models into INS to promote the inference precision. The experimental results show that our approach improved Hits@1 from 32.911% to 71.692% on the FB15K dataset and achieved much better AP@n evaluations than state-of-the-art methods.

Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014

Recursive Deep Learning for Sentiment Analysis over Social Data

Changliang Li; Bo Xu; Gaowei Wu; Saike He; Guanhua Tian; Hongwei Hao

Sentiment analysis has now become a popular research problem to tackle in NLP field. However, there are very few researches conducted on sentiment analysis for Chinese. Progress is held back due to lack of large and labelled corpus and powerful models. To remedy this deficiency, we build a Chinese Sentiment Treebank over social data. It concludes 13550 labeled sentences which are from movie reviews. Furthermore, we introduce a novel Recursive Neural Deep Model (RNDM) to predict sentiment label based on recursive deep learning. We consider the problem of classifying one sentence by overall sentiment, determining a review is positive or negative. On predicting sentiment label at sentence level, our model outperforms other commonly used baselines, such as Naïve Bayes, Maximum Entropy and SVM, by a large margin.

pacific-asia conference on knowledge discovery and data mining | 2015

Parallel Recursive Deep Model for Sentiment Analysis

Changliang Li; Bo Xu; Gaowei Wu; Saike He; Guanhua Tian; Yujun Zhou

Sentiment analysis has now become a popular research problem to tackle in Artificial Intelligence (AI) and Natural Language Processing (NLP) field. We introduce a novel Parallel Recursive Deep Model (PRDM) for predicting sentiment label distributions. The main trait of our model is to not only use the composition units, i.e., the vector of word, phrase and sentiment label with them, but also exploit the information encoded among the structure of sentiment label, by introducing a sentiment Recursive Neural Network (sentiment-RNN) together with RNTN. The two parallel neural networks together compose of our novel deep model structure, in which Sentiment-RNN and RNTN cooperate with each other. On predicting sentiment label distributions task, our model outperforms previous state of the art approaches on both full sentences level and phrases level by a large margin.

intelligence and security informatics | 2014

Ranking Online Memes in Emergency Events Based on Transfer Entropy

Saike He; Xiaolong Zheng; Xiuguo Bao; Hongyuan Ma; Daniel Dajun Zeng; Bo Xu; Guanhua Tian; Hongwei Hao

The rapid proliferation of online social networks has greatly boosted the dissemination and evolution of online memes, which can be free text, trending catchphrase, or micro media. However, this information abundance is exceeding the capability of the public to consume it, especially in unusual situations such as emergency management, intelligence acquisition, and crime analysis. Thus, there calls for a reliable approach to rank meme appropriately according to its influence, which will let the public focus on the most important memes without sinking into the information flood. However, studying meme in any detail on a large scale proves to be challenging. Previous bottom-up approaches are often highly complex, while the more recent top-down network analysis approaches lack detailed modeling for meme dynamics. In this paper, we first present a formal definition for meme ranking task, and then introduce a scheme for meme ranking in the context of online social networks (OSN). To the best of our knowledge, this is the first time that memes have been ranked in a model-free manner. Empirical results on two emergency events indicate that our scheme outperforms several benchmark approaches. This scheme is also robust by insensitive to sample rate. In light of the schemes fine-grain modeling on meme dynamics, we also reveal two key factors affecting meme influence.

intelligence and security informatics | 2015

Modeling emotion entrainment of online users in emergency events

Saike He; Xiaolong Zheng; Daniel Zeng; Bo Xu; Changliang Li; Guanhua Tian; Lei Wang; Hongwei Hao

Emotion entrainment accounts for the rhythmic convergence of human emotions through social interactions. This phenomenon abounds in various disciplines, i.e. effervescency in soccer games, anger proliferation in violence incidents, or anxiety diffusion in disasters. Although emotion entrainment is highly relevant to the quality of human daily life, the principles underpinning this phenomenon is still unclear. Previous dynamic models try to explain entrainment phenomenon by assuming symmetrical coupling among identical individuals. Yet this assumption clearly does not hold in real-world human interactions. As such, we propose an alternative model that captures asymmetric relationships. In depicting the coupling mechanism, the effect of social influence is also encoded. Experimental results on two emergent social events suggest that the proposed model characterizes emotion trends with high accuracy. Also, we explain the emotion dynamics by analyzing the reconstructed entrainment matrix. Our work may present practical implications for those who want to guide or regulate the emotion evolution in emergency events discussed online.

conference on intelligent text processing and computational linguistics | 2015

Short Text Hashing Improved by Integrating Multi-Granularity Topics and Tags

Jiaming Xu; Bo Xu; Guanhua Tian; Jun Zhao; Fangyuan Wang; Hongwei Hao

Due to computational and storage efficiencies of compact binary codes, hashing has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to utilize latent topics of certain granularity to preserve semantic similarity in hash codes beyond keyword matching. However, topics of certain granularity are not adequate to represent the intrinsic semantic information. In this paper, we present a novel unified approach for short text Hashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, we propose a selection method to choose the optimal multi-granularity topics depending on the type of dataset, and design two distinct hashing strategies to incorporate multi-granularity topics. We also propose a simple and effective method to exploit tags to enhance the similarity of related texts. We carry out extensive experiments on one short text dataset as well as on one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baselines on several evaluation metrics.

international conference on neural information processing | 2014

Short Text Hashing Improved by Integrating Topic Features and Tags

Jiaming Xu; Bo Xu; Jun Zhao; Guanhua Tian; Heng Zhang; Hongwei Hao

Hashing, as an efficient approach, has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to construct semantic relationship using certain granularity topics. However, the topics of certain granularity are insufficient to preserve the optimal semantic similarity for different types of datasets. On the other hand, tag information should be fully exploited to enhance the similarity of related texts. We, therefore, propose a novel unified hashing approach that the optimal topic features can be selected automatically to be integrated with original features for preserving similarity, and tags are fully utilized to improve hash code learning. We carried out extensive experiments on one short text dataset and even one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baseline methods on several evaluation metrics.

Explore More