Is this you? Create Your Porfile

Heyan Huang

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Heyan Huang is active.

Explore More

Publication

Featured researches published by Heyan Huang.

international conference on tools with artificial intelligence | 2014

Tri-Rank: An Authority Ranking Framework in Heterogeneous Academic Networks by Mutual Reinforce

Zhirun Liu; Heyan Huang; Xiaochi Wei; Xian-Ling Mao

Recently, authority ranking has received increasing interests in both academia and industry, and it is applicable to many problems such as discovering influential nodes and building recommendation systems. Various graph-based ranking approaches like PageRank have been used to rank authors and papers separately in homogeneous networks. In this paper, we take venue information into consideration and propose a novel graph-based ranking framework, Tri-Rank, to co-rank authors, papers and venues simultaneously in heterogeneous networks. This approach is a flexible framework and it ranks authors, papers and venues iteratively in a mutually reinforcing way to achieve a more synthetic, fair ranking result. We conduct extensive experiments using the data collected from ACM Digital Library. The experimental results show that Tri-Rank is more effective and efficient than the state-of-the-art baselines including PageRank, HITS and Co-Rank in ranking authors. The papers and venues ranked by Tri-Rank also demonstrate that Tri-Rank is rational.

Science in China Series F: Information Sciences | 2016

A novel unsupervised method for new word extraction

Lili Mei; Heyan Huang; Xiaochi Wei; Xian-Ling Mao

New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel method to extract new words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a candidate list of new words. Then, we employ the statistical language knowledge to extract the top ranked new words. Experimental results show that our proposed method is able to extract a large number of new words both in Chinese and English corpus, and notably outperforms the state-of-the-art methods. Moreover, we also demonstrate our method increases the accuracy of Chinese word segmentation by 10% on corpus containing new words.创新点1.本文提出了一个基于领域特殊性和统计语言知识的新词抽取方法。首先, 采用基于领域特殊性的垃圾串过滤方法过滤垃圾串, 得到候选新词列表; 然后基于统计语言知识(词频、凝聚度和自由度)对新词进行抽取。实验验证了该方法的有效性、语言独立性和领域无关性。2.该方法能够有效提升中文分词系统的分词效果。

Neurocomputing | 2016

Topic-related Chinese message sentiment analysis

Chun Liao; Chong Feng; Sen Yang; Heyan Huang

Considering sentiment analysis of microblogs plays an important role in behavior analysis of social media, there has been a significant progress in this area recently. However, most researches are topic-ignored and neglect the sentimental orientation towards different topics. We propose two combined methods for topic-related Chinese message sentiment analysis. One is a graph-based ranking model of LT-IGT which takes both local and global topical information into consideration. And the other is a method of exploring sentimental features on expanded topical words with word embedding which considers both the syntactic and semantic information. These two methods are integrated into a topic-related Chinese message sentiment classifier. Experimental results on SIGHAN8 dataset show the outperformance of this approach compared with other well-known methods on sentiment analysis of topic-related Chinese message.

CCL | 2015

EHLLDA: A Supervised Hierarchical Topic Model

Xian-Ling Mao; Yixuan Xiao; Qiang Zhou; Jun Wang; Heyan Huang

In this paper, we consider the problem of modeling hierarchical labeled data – such as Web pages and their placement in hierarchical directories. The state-of-the-art model, hierarchical Labeled LDA (hLLDA), assumes that each child of a non-leaf label has equal importance, and that a document in the corpus cannot locate in a non-leaf node. However, in most cases, these assumptions do not meet the actual situation. Thus, in this paper, we introduce a supervised hierarchical topic models: Extended Hierarchical Labeled Latent Dirichlet Allocation (EHLLDA), which aim to relax the assumptions of hLLDA by incorporating prior information of labels into hLLDA. The experimental results show that the perplexity performance of EHLLDA is always better than that of LLDA and hLLDA on all four datasets; and our proposed model is also superior to hLLDA in terms of p@n.

Journal of Computer Science and Technology | 2015

When Factorization Meets Heterogeneous Latent Topics: An Interpretable Cross-Site Recommendation Framework

Xin Xin; Chin-Yew Lin; Xiaochi Wei; Heyan Huang

Data sparsity is a well-known challenge in recommender systems. Previous studies alleviate this problem by incorporating the information within the corresponding social media site. In this paper, we solve this challenge by exploring cross-site information. Specifically, we examine: 1) how to effectively and efficiently utilize cross-site ratings and content features to improve recommendation performance and 2) how to make the recommendation interpretable by utilizing content features. We propose a joint model of matrix factorization and latent topic analysis. Heterogeneous content features are modeled by multiple kinds of latent topics. In addition, the combination of matrix factorization and latent topics makes the recommendation result interpretable. Therefore, the above two issues are simultaneously solved. Through a real-world dataset, where user behaviors in three social media sites are collected, we demonstrate that the proposed model is effective in improving recommendation performance and interpreting the rationale of ratings.

Chinese National Conference on Social Media Processing | 2014

A Hybrid Method of Domain Lexicon Construction for Opinion Targets Extraction Using Syntax and Semantics

Chun Liao; Chong Feng; Sen Yang; Heyan Huang

Considering opinion targets extraction of Chinese microblogs plays an important role in opinion mining, there has been a significant progress in this area recently, especially the CRF-based method. However, this method only takes lexical-related features into consideration and does not excavate the implied semantic and syntactic knowledge. We propose a new approach which incorporates domain lexicon with groups of features using syntax and semantics. The approach acquires domain lexicon through a novel way namely PDSP. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features together for opinion targets extraction. Experimental results on COAE2014 dataset show that this approach notably outperforms other baselines of opinion targets extraction.

IEEE Transactions on Knowledge and Data Engineering | 2018

Leveraging Conceptualization for Short-Text Embedding

Heyan Huang; Yashen Wang; Chong Feng; Zhirun Liu; Qiang Zhou

Most short-text embedding models typically represent each short-text only using the literal meanings of the words, which makes these models indiscriminative for the ubiquitous polysemy. In order to enhance the semantic representation capability of the short-texts, we (i) propose a novel short-text conceptualization algorithm to assign the associated concepts for each short-text, and then (ii) introduce the conceptualization results into learning the conceptual short-text embeddings. Hence, this semantic representation is more expressive than some widely-used text representation models such as the latent topic model. Wherein, the short-text conceptualization algorithm used here is based on a novel co-ranking framework, enabling the signals (i.e., the words and the concepts) to fully interplay to derive the solid conceptualization for the short-texts. Afterwards, we further extend the conceptual short-text embedding models by utilizing an attention-based model that selects the relevant words within the context to make more efficient prediction. The experiments on the real-world datasets demonstrate that the proposed conceptual short-text embedding model and short-text conceptualization algorithm are more effective than the state-of-the-art methods.

web age information management | 2017

Aligning Gaussian-Topic with Embedding Network for Summarization Ranking

Linjing Wei; Heyan Huang; Yang Gao; Xiaochi Wei; Chong Feng

Query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary is to construct quality sentences both for user specific queries and multiple documents. Sentence embedding has been shown effective in summarization tasks. However, these methods lack of the latent topic structure of contents. Hence, the summary lies only on vector space can hardly capture multi-topical content. In this paper, our proposed model incorporates the topical aspects and continuous vector representations, which jointly learns semantic rich representations encoded by vectors. Then, leveraged by topic filtering and embedding ranking model, the summarization can select desirable salient sentences. Experiments demonstrate outstanding performance of our proposed model from the perspectives of prominent topics and semantic coherence.

IEEE Transactions on Knowledge and Data Engineering | 2017

I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base

Xiaochi Wei; Heyan Huang; Liqiang Nie; Hanwang Zhang; Xian-Ling Mao; Tat-Seng Chua

Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject, verb, object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model.

Chinese National Conference on Social Media Processing | 2016

Query Intent Detection Based on Clustering of Phrase Embedding

Jiahui Gu; Chong Feng; Xiong Gao; Yashen Wang; Heyan Huang

Understanding ambiguous or multi-faceted search queries is essential for information retrieval. The task of identifying the major aspects or senses of queries can be viewed as detection of query intents, where the intents are represented as a number of clusters. So the challenging issue in this task is how to generate intent candidates and group them semantically. This paper explores the competence of lexical statistics and embedding method. First a novel term expansion algorithm is designed to sketch all possible intent candidates. Moreover, an efficient query intent generation model is proposed, which learns latent representations for intent candidates via embedding-based methods. And then vectorized intent candidates are clustered and detected as query intents. Experimental results, based on the NTCIR-12 IMine-2 corpus, show that query intent generation model via phrase embedding significantly outperforms the state-of-art clustering algorithms in query intent detection.

Explore More