Is this you? Create Your Porfile

Chong Feng

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chong Feng is active.

Explore More

Publication

Featured researches published by Chong Feng.

meeting of the association for computational linguistics | 2016

CSE: Conceptual Sentence Embeddings based on Attention Model

Yashen Wang; Heyan Huang; Chong Feng; Qiang Zhou; Jiahui Gu; Xiong Gao

Most sentence embedding models typically represent each sentence only using word surface, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance representation capability of sentence, we employ conceptualization model to assign associated concepts for each sentence in the text corpus, and then learn conceptual sentence embedding (CSE). Hence, this semantic representation is more expressive than some widely-used text representation models such as latent topic model, especially for short-text. Moreover, we further extend CSE models by utilizing a local attention-based model that select relevant words within the context to make more efficient prediction. In the experiments, we evaluate the CSE models on two tasks, text classification and information retrieval. The experimental results show that the proposed models outperform typical sentence embed-ding models.

social informatics | 2013

Micro-blog Post Topic Drift Detection Based on LDA Model

Quanchao Liu; Heyan Huang; Chong Feng

Micro-blog posts imply a large number of topics, which contain a lot of useful information as well as a lot of junk information making the micro-blog post topic a characteristic of high drift. The changes of micro-blog post topic over time and noises introduced with the increase of the number of micro-blog posts are two main aspects of micro-blog post topic drift. We propose a method of topic drift detection based on LDA model, using Gibbs sampling algorithm to obtain the probability distribution of micro-blog post words based on words correlation, identifying the topic boundary in dynamic constant method, extracting topic words by computing lexical information entropy in the topic field, and detecting the topic drift by topic words sequence alignment based on discrete-time model. According to the experiment on topic drift detection based on LDA model, we find our method very effective in micro-blog post topic drift detection.

asia-pacific web conference | 2015

A Co-ranking Framework to Select Optimal Seed Set for Influence Maximization in Heterogeneous Network

Yashen Wang; Heyan Huang; Chong Feng; Xianxiang Yang

The rising popularity of social media presents new opportunities for one of the enterprise’s most important needs—selecting most influential individuals in viral marketing, which has attracted increasing attention in both academia and industry. Most recent algorithms of influence maximization have demonstrated remarkable successes, however their applications are limited to homogeneous networks. In this paper, we formulate the problem of influence maximization in heterogeneous network, and propose a co-ranking framework to simultaneously select seed sets with different types. This framework is flexible and could adequately takes advantage of additional information implicit in the heterogeneous structure. We conduct extensive experiments using the data collected from ACM Digital Library, and the experimental results show that both the quality and the running time of the proposed algorithm rival the existing algorithms.

IEEE Transactions on Knowledge and Data Engineering | 2018

Leveraging Conceptualization for Short-Text Embedding

Heyan Huang; Yashen Wang; Chong Feng; Zhirun Liu; Qiang Zhou

Most short-text embedding models typically represent each short-text only using the literal meanings of the words, which makes these models indiscriminative for the ubiquitous polysemy. In order to enhance the semantic representation capability of the short-texts, we (i) propose a novel short-text conceptualization algorithm to assign the associated concepts for each short-text, and then (ii) introduce the conceptualization results into learning the conceptual short-text embeddings. Hence, this semantic representation is more expressive than some widely-used text representation models such as the latent topic model. Wherein, the short-text conceptualization algorithm used here is based on a novel co-ranking framework, enabling the signals (i.e., the words and the concepts) to fully interplay to derive the solid conceptualization for the short-texts. Afterwards, we further extend the conceptual short-text embedding models by utilizing an attention-based model that selects the relevant words within the context to make more efficient prediction. The experiments on the real-world datasets demonstrate that the proposed conceptual short-text embedding model and short-text conceptualization algorithm are more effective than the state-of-the-art methods.

web age information management | 2017

Aligning Gaussian-Topic with Embedding Network for Summarization Ranking

Linjing Wei; Heyan Huang; Yang Gao; Xiaochi Wei; Chong Feng

Query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary is to construct quality sentences both for user specific queries and multiple documents. Sentence embedding has been shown effective in summarization tasks. However, these methods lack of the latent topic structure of contents. Hence, the summary lies only on vector space can hardly capture multi-topical content. In this paper, our proposed model incorporates the topical aspects and continuous vector representations, which jointly learns semantic rich representations encoded by vectors. Then, leveraged by topic filtering and embedding ranking model, the summarization can select desirable salient sentences. Experiments demonstrate outstanding performance of our proposed model from the perspectives of prominent topics and semantic coherence.

web age information management | 2016

Conceptual Sentence Embeddings

Yashen Wang; Heyan Huang; Chong Feng; Qiang Zhou; Jiahui Gu

Most sentence embedding models typically represent each sentence only using word surface, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ concept conceptualization model to assign associated concepts for each sentence in the text corpus, and learn conceptual sentence embedding (CSE). Hence, the sentence representations are more expressive than some widely-used document representation models such as latent topic models, especially for short text. In the experiments, we evaluate the CSE models on two tasks, text classification and information retrieval. The experimental results show that the proposed models outperform typical sentence embedding models.

Chinese National Conference on Social Media Processing | 2016

Query Intent Detection Based on Clustering of Phrase Embedding

Jiahui Gu; Chong Feng; Xiong Gao; Yashen Wang; Heyan Huang

Understanding ambiguous or multi-faceted search queries is essential for information retrieval. The task of identifying the major aspects or senses of queries can be viewed as detection of query intents, where the intents are represented as a number of clusters. So the challenging issue in this task is how to generate intent candidates and group them semantically. This paper explores the competence of lexical statistics and embedding method. First a novel term expansion algorithm is designed to sketch all possible intent candidates. Moreover, an efficient query intent generation model is proposed, which learns latent representations for intent candidates via embedding-based methods. And then vectorized intent candidates are clustered and detected as query intents. Experimental results, based on the NTCIR-12 IMine-2 corpus, show that query intent generation model via phrase embedding significantly outperforms the state-of-art clustering algorithms in query intent detection.

web age information management | 2015

Community Detection Based on Minimum-Cut Graph Partitioning

Yashen Wang; Heyan Huang; Chong Feng; Zhirun Liu

One of the most useful measurements of community detection quality is the modularity, which evaluates how a given division deviates from an expected random graph. This article demonstrates that (i) modularity maximization can be transformed into versions of the standard minimum-cut graph partitioning, and (ii) normalized version of modularity maximization is identical to normalized cut graph partitioning. Meanwhile, we innovatively combine the modularity theory with popular statistical inference method in two aspects: (i) transforming such statistical model into null model in modularity maximization; (ii) adapting the objective function of statistical inference method for our optimization. Based on the demonstrations above, this paper proposes an efficient algorithm for community detection by adapting the Laplacian spectral partitioning algorithm. The experiments, in both real-world and synthetic networks, show that both the quality and the running time of the proposed algorithm rival the previous best algorithms.

web age information management | 2014

Chinese Evaluation Phrase Extraction Based on Cascaded Model

Yashen Wang; Chong Feng; Quanchao Liu; Heyan Huang

With the development of social media, massive reviews are generated by users every day. The extraction of evaluative information, including opinion holder, comment target and evaluation phrase, is an important pre-task of opinion analysis and also in great need, especially for Chinese text. This paper proposes an efficient method for extracting Chinese evaluation phrase based on cascaded model and mainly makes three contributions: (i) to implement and evaluate the method, we construct an original annotated corpus for Chinese evaluation phrase of automobile; (ii) based on Conditional Random Fields, we identify the evaluation phrase which is in simple structure; (iii) three kinds of rule-based methods, such as parenthesis/preposition/adverb phrase rule, are designed to extract evaluation phrase in complex structure. According to the experiment results, the proposed method performs well. Meanwhile it contributes greatly to our subsequent tasks, such as sentiment analysis of social media.

pacific asia conference on language information and computation | 2012