Nan Duan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nan Duan is active.

Explore More

Publication

Featured researches published by Nan Duan.

meeting of the association for computational linguistics | 2014

Knowledge-Based Question Answering as Machine Translation

Junwei Bao; Nan Duan; Ming Zhou; Tiejun Zhao

A typical knowledge-based question answering (KB-QA) system faces two challenges: one is to transform natural language questions into their meaning representations (MRs); the other is to retrieve answers from knowledge bases (KBs) using generated MRs. Unlike previous methods which treat them in a cascaded manner, we present a translation-based approach to solve these two tasks in one unified framework. We translate questions to answers based on CYK parsing. Answers as translations of the span covered by each CYK cell are obtained by a question translation method, which first generates formal triple queries as MRs for the span based on question patterns and relation expressions, and then retrieves answers from a given KB based on triple queries generated. A linear model is defined over derivations, and minimum error rate training is used to tune feature weights based on a set of question-answer pairs. Compared to a KB-QA system using a state-of-the-art semantic parser, our method achieves better results.

conference on information and knowledge management | 2015

Answering Questions with Complex Semantic Constraints on Open Knowledge Bases

Pengcheng Yin; Nan Duan; Ben Kao; Junwei Bao; Ming Zhou

A knowledge-based question-answering system (KB-QA) is one that answers natural language questions with information stored in a large-scale knowledge base (KB). Existing KB-QA systems are either powered by curated KBs in which factual knowledge is encoded in entities and relations with well-structured schemas, or by open KBs, which contain assertions represented in the form of triples (e.g., subject; relation phrase; argument). We show that both approaches fall short in answering questions with complex prepositional or adverbial constraints. We propose using n-tuple assertions, which are assertions with an arbitrary number of arguments, and n-tuple open KB (nOKB), which is an open knowledge base of n-tuple assertions. We present TAQA, a novel KB-QA system that is based on an nOKB and illustrate via experiments how TAQA can effectively answer complex questions with rich semantic constraints. Our work also results in a new open KB containing 120M n-tuple assertions and a collection of 300 labeled complex questions, which is made publicly available for further research.

meeting of the association for computational linguistics | 2016

DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents

Zhao Yan; Nan Duan; Junwei Bao; Peng Chen; Ming Zhou; Zhoujun Li; Jianshe Zhou

Most current chatbot engines are designed to reply to user utterances based on existing utterance-response (or Q-R)1 pairs. In this paper, we present DocChat, a novel information retrieval approach for chatbot engines that can leverage unstructured documents, instead of Q-R pairs, to respond to utterances. A learning to rank model with features designed at different levels of granularity is proposed to measure the relevance between utterances and responses directly. We evaluate our proposed approach in both English and Chinese: (i) For English, we evaluate DocChat on WikiQA and QASent, two answer sentence selection tasks, and compare it with state-of-the-art methods. Reasonable improvements and good adaptability are observed. (ii) For Chinese, we compare DocChat with XiaoIce2, a famous chitchat engine in China, and side-by-side evaluation shows that DocChat is a perfect complement for chatbot engines using Q-R pairs as main source of responses.

Neurocomputing | 2018

Content-Based Table Retrieval for Web Queries

Zhao Yan; Duyu Tang; Nan Duan; Junwei Bao; Yuanhua Lv; Ming Zhou; Zhoujun Li

Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Results verify the effectiveness of our approach and present the challenges for this task.

National CCF Conference on Natural Language Processing and Chinese Computing | 2017

An Information Retrieval-Based Approach to Table-Based Question Answering

Junwei Bao; Nan Duan; Ming Zhou; Tiejun Zhao

We propose a simple yet effective information retrieval based approach to answer complex questions with open domain web tables. Specifically, given a question and a table, we rank all table cells based on their representations, and select the cells of the highest ranking score as the answer. To represent a cell, we design rich features which leverage both the semantic information of the question and the structure information of the table. The experiments are conducted on WIKITABLEQUESTIONS dataset in which the questions have complex semantics. Compared to a semantic parsing based method, our approach improves the accuracy score by 6.03 points.

Knowledge Based Systems | 2017

Response selection from unstructured documents for human-computer conversation systems

Zhao Yan; Nan Duan; Junwei Bao; Peng Chen; Ming Zhou; Zhoujun Li

Abstract This paper studies response selection for human-computer conversation systems. Existing retrieval-based human-computer conversation systems are intended to reply to user utterances based on existing utterance-response pairs. However, collecting sufficient utterance-response pairs is intractable in practical situations, especially for many specific domains. We introduce DocChat a novel information retrieval approach for human-computer conversation systems that can use unstructured documents rather than semi-structured utterance-response pairs, to react to user utterances. The key of DocChat is a learning to rank model with features designed at various levels of granularity which is proposed to quantify the relevance between utterances and responses directly. We conduct comprehensive experiments on both sentence selection and real human-computer conversation scenarios. Empirical studies of sentence selection datasets shows reasonable improvements and the strong adaptability of our model. We compare DocChat with Xiaoice, a famous open domain chitchat engine in China. Side-by-side evaluation shows that DocChat is a good complement for human-computer conversation systems using utterance-response pairs as the primary source of responses. Furthermore, we release a large scale open-domain dataset for sentence selection which contains 304,413 query-sentence pairs.

NLPCC/ICCPOL | 2016

An Open Domain Topic Prediction Model for Answer Selection

Zhao Yan; Nan Duan; Ming Zhou; Zhoujun Li; Jianshe Zhou

We present an open domain topic prediction model for the answer selection task. Different from previous unsupervised topic modeling methods, we automatically extract high quality and large scale \(\langle \)sentence, topic\(\rangle \) pairs from Wikipedia as labeled data, and train an open domain topic prediction model based on convolutional neural network, which can predict the most possible topics for each given input sentence. To verify the usefulness of our proposed approach, we add the topic prediction model into an end-to-end open domain question answering system and evaluate it on the answer selection task, and improvements are obtained on both WikiQA and QASent datasets.

international conference on computational linguistics | 2016