Is this you? Create Your Porfile

Baoxun Wang

Harbin Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Baoxun Wang is active.

Explore More

Publication

Featured researches published by Baoxun Wang.

international joint conference on natural language processing | 2015

Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory

Xin Wang; Yuanchao Liu; Chengjie Sun; Baoxun Wang; Xiaolong Wang

In this paper, we introduce Long ShortTerm Memory (LSTM) recurrent network for twitter sentiment prediction. With the help of gates and constant error carousels in the memory block structure, the model could handle interactions between words through a flexible compositional function. Experiments on a public noisy labelled data show that our model outperforms several feature-engineering approaches, with the result comparable to the current best data-driven technique. According to the evaluation on a generated negation phrase test set, the proposed architecture doubles the performance of non-neural model based on bag-of-word features. Furthermore, words with special functions (such as negation and transition) are distinguished and the dissimilarities of words with opposite sentiment are magnified. An interesting case study on negation expression processing shows a promising potential of the architecture dealing with complex sentiment phrases.

systems, man and cybernetics | 2009

Extracting Chinese question-answer pairs from online forums

Baoxun Wang; Bingquan Liu; Chengjie Sun; Xiaolong Wang; Lin Sun

Extracting question-answer pairs from online forums is a meaningful work due to the huge amount of valuable user generated resource contained in forums. In this paper we consider the problem of extracting Chinese question-answer pairs for the first time. We present a strategy to detect Chinese questions and their answers. We propose a sequential rule based method to find questions in a forum thread, then we adopt nontextual features based on forum structure to improve the performance of answer detecting in the same thread. Experimental results show that our techniques are very effective.

ACM Transactions on Asian Language Information Processing | 2011

Deep Learning Approaches to Semantic Relevance Modeling for Chinese Question-Answer Pairs

Baoxun Wang; Bingquan Liu; Xiaolong Wang; Chengjie Sun; Deyuan Zhang

The human-generated question-answer pairs in the Web social communities are of great value for the research of automatic question-answering technique. Due to the large amount of noise information involved in such corpora, it is still a problem to detect the answers even though the questions are exactly located. Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. Since both the questions and their answers usually contain a small number of sentences, the relevance modeling methods have to overcome the problem of word feature sparsity. In this article, the deep learning principle is introduced to address the semantic relevance modeling task. Two deep belief networks with different architectures are proposed by us to model the semantic relevance for the question-answer pairs. According to the investigation of the textual similarity between the community-driven question-answering (cQA) dataset and the forum dataset, a learning strategy is adopted to promote our models’ performance on the social community corpora without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.

international symposium on neural networks | 2017

Incorporating loose-structured knowledge into conversation modeling via recall-gate LSTM

Zhen Xu; Bingquan Liu; Baoxun Wang; Chengjie Sun; Xiaolong Wang

It is critical for automatic chat-bots to gain the ability of conversation comprehension, which is the essence to provide context-aware responses to conduct smooth dialogues with human beings. As the basis of this task, conversation modeling will notably benefit from the background knowledge, since such knowledge indeed implicates semantic hints that help to further clarify the relationships between sentences within a conversation. In this paper, a deep neural network is proposed to incorporate background knowledge for conversation modeling. Through a recall mechanism with a specially designed recall-gate, background knowledge as global memory can be motivated to cooperate with local cell memory of Long Short-Term Memory (LSTM), so as to enrich the ability of LSTM to capture the implicit semantic clues in conversations. In addition, this paper introduces the loose-structured domain knowledge as background knowledge, which can be built with slight amount of manual work and easily adopted by the recall-gate. Our model is evaluated on the context-oriented response selecting task, and experimental results on two datasets have shown that our approach is promising for modeling conversations and building key components of automatic chat systems.

empirical methods in natural language processing | 2017

Neural Response Generation via GAN with an Approximate Embedding Layer.

Zhen Xu; Bingquan Liu; Baoxun Wang; Chengjie Sun; Xiaolong Wang; Zhuoran Wang; Chao Qi

This paper presents a Generative Adversarial Network (GAN) to model single-turn short-text conversations, which trains a sequence-to-sequence (Seq2Seq) network for response generation simultaneously with a discriminative classifier that measures the differences between human-produced responses and machine-generated ones. In addition, the proposed method introduces an approximate embedding layer to solve the non-differentiable problem caused by the sampling-based output decoding procedure in the Seq2Seq generative model. The GAN setup provides an effective way to avoid noninformative responses (a.k.a “safe responses”), which are frequently observed in traditional neural response generators. The experimental results show that the proposed approach significantly outperforms existing neural response generation models in diversity metrics, with slight increases in relevance scores as well, when evaluated on both a Mandarin corpus and an English corpus.

international conference on machine learning and cybernetics | 2013

Exploring social features for answer quality prediction in CQA portals

Haifeng Hu; Bingquan Liu; Baoxun Wang; Ming Liu; Xiaolong Wang

The popularity of community based Question Answering (cQA) portals gives rise to the fact that the quality of answer content usually range from very high to very low. In this paper, we exploit social features and topic based features to address a key issue in Chinese cQA portals: predicting the answer quality. Different from previous work, we first investigate and analyze the answers of Haidu Zhidao based on the social features extracted from different aspects. Thereafter, we build a predictive model through machine learning based on the proposed features to make prediction. Extensive experimental results demonstrate the distinguishing ability of social features to predict answer quality. Moreover, we make systematic comparison on different groups of features and find that answer statistic features play a key role in improving the overall performance. In addition, we also find that topic based features outperform word based features a lot.

Acta Automatica Sinica | 2013

Thread Segmentation Based Answer Detection in Chinese Online Forums

Baoxun Wang; Bingquan Liu; Chengjie Sun; Xiaolong Wang; Lin Sun

Abstract Detecting answers in the threads is an essential task for the online forum oriented question-answer (QA) pair mining. In the forum threads, there normally exist implicit discussion structures with the valuable indicating information for the answer detecting models to locate the best answers. This paper proposes a thread segmentation based answer detecting approach: a forum thread is reorganized into several segments, and a group of features reflecting the discussion structures are extracted based on the segmentation results. Utilizing the segment information, a strategy is put forward to find the best answers. By evaluating the candidate answers in different types of segments with different models, the strategy filters the samples that mislead the decision. The experimental results show that our approach is promising for mining the QA resource in the online forums.

fuzzy systems and knowledge discovery | 2010

A study of features on Primary Question detection in Chinese online forums

Lin Sun; Bingquan Liu; Baoxun Wang; Deyuan Zhang; Xiaolong Wang

Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, a combination of textual and N-gram features achieved via feature selection is adopted to help detecting primary questions. By viewing primary question detection a binary classification problem, decision tree classifier C4.5 and support vector machine are introduced to distinguish questions from non-questions separately. Experimental results across multiple datasets demonstrate that the mixture of textual and N-gram features performs better than using each of them separately under both C4.5 and support vector machine. By computing the weight of each feature in the two classifiers, the top 6 features are found the very same except for a little adjustment of order, showing that the combination of textual and N-gram features is universal and effective in detecting primary questions.

artificial intelligence and computational intelligence | 2009

Adaptive Maximum Marginal Relevance Based Multi-email Summarization

Baoxun Wang; Bingquan Liu; Chengjie Sun; Xiaolong Wang; Bo Li

By analyzing the inherent relationship between the maximum marginal relevance (MMR) model and the content cohesion of emails with the same subject, this paper presents an adaptive maximum marginal relevance based multi-email summarization method. Due to the adoption of approximate computing of email content cohesion, the adaptive MMR is able to automatically adjust the parameters according to the changing of the email sets. The experimental results have shown that the email summarizing system based on this technique can increase the precision while reducing the redundancy of the automatic summary results, consequently improve the average quality of email summaries.

Computer Speech & Language | 2019

Enhancing generative conversational service agents with dialog history and external knowledge

Zongsheng Wang; Zhuoran Wang; Yinong Long; Jianan Wang; Zhen Xu; Baoxun Wang

Abstract For generative conversational agents, especially service-oriented systems, it is of great importance to improve the informativeness of generated responses and avoid bland results. In this paper, we describe our attempt at generating natural and informative responses for customer service oriented dialog systems, by incorporating dialog history related information and external knowledge. Two improved sequence-to-sequence frameworks are proposed to generate responses based on extra information in addition to the current user input, one encodes the entire dialogue history, while the other integrates external knowledge extracted from a search engine. The experimental results on the DSCT6-Track2 and Ubuntu Dialog corpora demonstrate that the proposed systems are promising to generate more informative responses. However, case studies suggest that some particular features of the proposed systems and the datasets might restrict the systems to fully exploit such extra information.

Explore More