Is this you? Create Your Porfile

Yaoyun Zhang

Harbin Institute of Technology Shenzhen Graduate School

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yaoyun Zhang is active.

Explore More

Publication

Featured researches published by Yaoyun Zhang.

Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing | 2014

Problematic Situation Analysis and Automatic Recognition for Chinese Online Conversational System

Yang Xiang; Yaoyun Zhang; Xiaoqiang Zhou; Xiaolong Wang; Yang Qin

Automatic problematic situation recognition (PSR) is important for an online conversational system to constantly improve its performance. A PSR module is responsible of automatically identifying users’ un-satisfactions and then sending feedbacks to conversation managers. In this paper, we collect dialogues from a Chinese online chatbot, annotate the problematic situations and propose a framework to predict utterance-level problematic situations by integrating intent and sentiment factors. Different from previous work, the research field is set as open-domain in which very few domain specific textual features could be used and the method is easy to be adapted to other domains. Experimental results show that integrating both intent and sentiment factors gains the best performance.

systems, man and cybernetics | 2009

Using question classification to model user intentions of different levels

Yaoyun Zhang; Xuan Wang; Xiaolong Wang; Shixi Fan; Daoxu Zhang

User information need detection is a fundamental issue in automatic question answering systems. Based on real questions collected from on-line question answering communities, this paper proposes a three-level question type taxonomy to model user information need. The three levels are based on interrogative patterns, hidden user intentions and specific answer expectations. One question can have multiple types in level 2&3. Question type assignment of level 2&3 is subjective-orientated, and may vary between different users. Shallow lexical, syntactic and semantic features are used to model the inherent subjectivity of user intentions. Classification experiments are conducted on a corpus of real questions collected from the web. Different machine learning methods are employed. Experimental results are promising. This indicates the capability of modeling user information need and subjectivity statistically, and that strong correlations exist between question types of the same level.

Mathematical Problems in Engineering | 2015

Bias Modeling for Distantly Supervised Relation Extraction

Yang Xiang; Yaoyun Zhang; Xiaolong Wang; Yang Qin; Wenying Han

Distant supervision (DS) automatically annotates free text with relation mentions from existing knowledge bases (KBs), providing a way to alleviate the problem of insufficient training data for relation extraction in natural language processing (NLP). However, the heuristic annotation process does not guarantee the correctness of the generated labels, promoting a hot research issue on how to efficiently make use of the noisy training data. In this paper, we model two types of biases to reduce noise: (1) bias-dist to model the relative distance between points (instances) and classes (relation centers); (2) bias-reward to model the possibility of each heuristically generated label being incorrect. Based on the biases, we propose three noise tolerant models: MIML-dist, MIML-dist-classify, and MIML-reward, building on top of a state-of-the-art distantly supervised learning algorithm. Experimental evaluations compared with three landmark methods on the KBP dataset validate the effectiveness of the proposed methods.

international conference on machine learning and cybernetics | 2010

CogQTaxo: Modeling human cognitive process with a three-dimensional question taxonomy

Yaoyun Zhang; Xiaolong Wang; Xuan Wang; Shixi Fan

In question answering systems, question taxonomy is commonly used as the representation of user information needs. This paper proposes CogQTaxo, a framework of three-dimensional question taxonomy. The dimensions represent the surface information need, the implicit information needs and the pragmatic expectations respectively. The employed linguistic classification criteria follow the cognitive process of human question interpretation. A case study of the information needs of users new to a restricted domain is conducted. Inter-annotation agreement and machine learning classification experiments show that users under the same context have similar and predictable information needs in all the three dimensions and that dependency relations exist between information needs of different dimensions.

international conference on neural information processing | 2015

Distant Supervision for Relation Extraction via Group Selection

Yang Xiang; Xiaolong Wang; Yaoyun Zhang; Yang Qin; Shixi Fan

Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels usually tend to be noisy, when a pair of entity has multiple and/or incomplete relations in a KB. This paper proposes two ranking-based methods to reduce noise and select effective training data for multi-instance multi-label learning MIML, one of the most popular learning paradigms for distantly supervised relation extraction. Through the proposed methods, training groups that are of low quality are excluded from the training data according to different ranking strategies. Experimental evaluation on the KBP dataset using state-of-the-art MIML algorithms in this community demonstrated that the proposed methods improved the performance significantly.

international conference on machine learning and cybernetics | 2014

A Chinese question answering system based on web search

Zengjian Liu; Xiaolong Wang; Qingcai Chen; Yaoyun Zhang; Yang Xiang

With the rapid development of search engine technology, the massive information on the internet becomes increasingly easy to search and utilize. However, Reading a large number of web pages of search engine results is also a hard work for users. Therefore how to conveniently and directly get the answers is a recent research focus. In this paper, we put forward a Chinese question answering system which uses the real-time network information retrieved by search engines. By inputting a natural language question, users can get an accurate answer. There are three main steps to extract the answers in our system. First is the question analysis, extract keywords and type of the question. Then the second step is to retrieve relevant pages through web search engines. The last and most important step is answer extraction, evaluate all extracted candidate answers, and the final answer will be the one with highest score. In addition to the system implementation, we also evaluated the performance of our system with an artificial building question-answer dataset. And the results obviously proved the feasibility of our system.

international conference on neural information processing | 2011

Diversifying Question Recommendations in Community-Based Question Answering

Yaoyun Zhang; Xiaolong Wang; Xuan Wang; Ruifeng Xu; Buzhou Tang

Question retrieval is an important research topic in community-based question answering (QA). Conventionally, questions semantically equivalent to the query question are considered as top ranks. However, traditional question retrieval technique has the difficulty to process the users’ information needs which are implicitly embedded in the question. This paper proposes a novel method of question recommendation by considering user’s diverse information needs. By estimating information need compactness in the question retrieval results, we further identify the retrieval results need to be diversified. For these results, the scores of information retrieval model, the importance and novelty of both question types and the informational aspects of question content, are combined to do diverse question recommendation. Comparative experiments on a large scale real community-based QA dataset show that the proposed method effectively improves information need coverage and diversity through relevant questions recommendation.

conference on computational natural language learning | 2013