Yandong Liu
Emory University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yandong Liu.
international world wide web conferences | 2008
Jiang Bian; Yandong Liu; Eugene Agichtein; Hongyuan Zha
Community Question Answering has emerged as a popular and effective paradigm for a wide range of information needs. For example, to find out an obscure piece of trivia, it is now possible and even very effective to post a question on a popular community QA site such as Yahoo! Answers, and to rely on other users to provide answers, often within minutes. The importance of such community QA sites is magnified as they create archives of millions of questions and hundreds of millions of answers, many of which are invaluable for the information needs of other searchers. However, to make this immense body of knowledge accessible, effective answer retrieval is required. In particular, as any user can contribute an answer to a question, the majority of the content reflects personal, often unsubstantiated opinions. A ranking that combines both relevance and quality is required to make such archives usable for factual information retrieval. This task is challenging, as the structure and the contents of community QA archives differ significantly from the web setting. To address this problem we present a general ranking framework for factual information retrieval from social media. Results of a large scale evaluation demonstrate that our method is highly effective at retrieving well-formed, factual answers to questions, as evaluated on a standard factoid QA benchmark. We also show that our learning framework can be tuned with the minimum of manual labeling. Finally, we provide result analysis to gain deeper understanding of which features are significant for social media search and retrieval. Our system can be used as a crucial building block for combining results from a variety of social media content with general web search results, and to better integrate social media content for effective information access.
international acm sigir conference on research and development in information retrieval | 2008
Yandong Liu; Jiang Bian; Eugene Agichtein
Question answering communities such as Naver and Yahoo! Answers have emerged as popular, and often effective, means of information seeking on the web. By posting questions for other participants to answer, information seekers can obtain specific answers to their questions. Users of popular portals such as Yahoo! Answers already have submitted millions of questions and received hundreds of millions of answers from other participants. However, it may also take hours --and sometime days-- until a satisfactory answer is posted. In this paper we introduce the problem of predicting information seeker satisfaction in collaborative question answering communities, where we attempt to predict whether a question author will be satisfied with the answers submitted by the community participants. We present a general prediction model, and develop a variety of content, structure, and community-focused features for this task. Our experimental results, obtained from a largescale evaluation over thousands of real questions and user ratings, demonstrate the feasibility of modeling and predicting asker satisfaction. We complement our results with a thorough investigation of the interactions and information seeking patterns in question answering communities that correlate with information seeker satisfaction. Our models and predictions could be useful for a variety of applications such as user intent inference, answer ranking, interface design, and query suggestion and routing.
international world wide web conferences | 2009
Jiang Bian; Yandong Liu; Ding Zhou; Eugene Agichtein; Hongyuan Zha
Community Question Answering (CQA) has emerged as a popular forum for users to pose questions for other users to answer. Over the last few years, CQA portals such as Naver and Yahoo! Answers have exploded in popularity, and now provide a viable alternative to general purpose Web search. At the same time, the answers to past questions submitted in CQA sites comprise a valuable knowledge repository which could be a gold mine for information retrieval and automatic question answering. Unfortunately, the quality of the submitted questions and answers varies widely - increasingly so that a large fraction of the content is not usable for answering queries. Previous approaches for retrieving relevant and high quality content have been proposed, but they require large amounts of manually labeled data -- which limits the applicability of the supervised approaches to new sites and domains. In this paper we address this problem by developing a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process. Results of a large scale evaluation demonstrate that our methods are more effective than previous approaches for finding high-quality answers, questions, and users. More importantly, our quality estimation significantly improves the accuracy of search over CQA archives over the state-of-the-art methods.
international acm sigir conference on research and development in information retrieval | 2008
Baoli Li; Yandong Liu; Ashwin Ram; Ernest V. Garcia; Eugene Agichtein
In this paper we begin to investigate how to automatically determine the subjectivity orientation of questions posted by real users in community question answering (CQA) portals. Subjective questions seek answers containing private states, such as personal opinion and experience. In contrast, objective questions request objective, verifiable information, often with support from reliable sources. Knowing the question orientation would be helpful not only for evaluating answers provided by users, but also for guiding the CQA engine to process questions more intelligently. Our experiments on Yahoo! Answers data show that our method exhibits promising performance.
ACM Transactions on Knowledge Discovery From Data | 2009
Eugene Agichtein; Yandong Liu; Jiang Bian
Question Answering Communities such as Naver, Baidu Knows, and Yahoo! Answers have emerged as popular, and often effective, means of information seeking on the web. By posting questions for other participants to answer, information seekers can obtain specific answers to their questions. Users of CQA portals have already contributed millions of questions, and received hundreds of millions of answers from other participants. However, CQA is not always effective: in some cases, a user may obtain a perfect answer within minutes, and in others it may require hours—and sometimes days—until a satisfactory answer is contributed. We investigate the problem of predicting information seeker satisfaction in collaborative question answering communities, where we attempt to predict whether a question author will be satisfied with the answers submitted by the community participants. We present a general prediction model, and develop a variety of content, structure, and community-focused features for this task. Our experimental results, obtained from a large-scale evaluation over thousands of real questions and user ratings, demonstrate the feasibility of modeling and predicting asker satisfaction. We complement our results with a thorough investigation of the interactions and information seeking patterns in question answering communities that correlate with information seeker satisfaction. We also explore personalized models of asker satisfaction, and show that when sufficient interaction history exists, personalization can significantly improve prediction accuracy over a “one-size-fits-all” model. Our models and predictions could be useful for a variety of applications, such as user intent inference, answer ranking, interface design, and query suggestion and routing.
human factors in computing systems | 2011
Ryen W. White; Matthew Richardson; Yandong Liu
Social question-and-answer (Q&A) involves the location of answers to questions through communication with people. Social Q&A systems, such as mailing lists and Web forums are popular, but their asynchronous nature can lead to high answer latency. Synchronous Q&A systems facilitate real-time dialog, usually via instant messaging, but face challenges with interruption costs and the availability of knowledgeable answerers at question time. We ran a longitudinal study of a synchronous social Q&A system to investigate the effects of the rate with which potential answerers were contacted (trading off time-to-answer against interruption cost) and community size (varying total number of members). We found important differences in subjective and objective measures of system performance with these variations. Our findings help us understand the costs and benefits of varying contact rate and community size in synchronous social Q&A, and inform system design for social Q&A.
international acm sigir conference on research and development in information retrieval | 2008
Yandong Liu; Eugene Agichtein
While question answering communities have been gaining popularity for several years, we wonder if the increased popularity actually improves or degrades the user experience. In addition, automatic QA systems, which utilize different sources such as search engines and social media, are emerging rapidly. QA communities have already created abundant resources of millions of questions and hundreds of millions of answers. The question whether they will continue to serve as an effective source is of information for web search and question answering is of vital importance. In this poster, we investigate the temporal evolution of a popular QA community - Yahoo! Answers, with respect to its effectiveness in answering three basic types of questions: factoid, opinion and complex questions. Our experiments show that Yahoo! Answers keeps growing rapidly, while its overall quality as an information source for factoid question-answering degrades. However, instead of answering factoid questions, it might be more effective to answer opinion and complex questions.
international acm sigir conference on research and development in information retrieval | 2009
Yandong Liu; Nitya Narasimhan; Venu Vasudevan; Eugene Agichtein
As online Collaborative Question Answering (CQA) servicessuch as Yahoo! Answers and Baidu Knows are attracting users, questions, and answers at an explosive rate, the truly urgent and important questions are increasingly getting lost in the crowd. That is, questions that require immediate responses are pushed out of the way by the trivial but more recently arriving questions. Unlike other questions in collaborative question answering (CQA) for which users might be willing to wait until good answers appear, urgent questions are likely to be of interest to the asker only if answered in the next few minutes or hours. For such questions, late responses are either not useful or are simply not applicable. Unfortunately, current collaborative question-answering systems do not distinguish urgent questions from the rest, and could thus be ineffective for urgent information needs. We explore text- and data- mining methods for automatically identifying urgent questions in the CQA setting. Our results indicate that modeling the question context (i.e., the particular forum/category where the question was posted) can increase classification accuracy compared to the text of the question alone.
international conference on data mining | 2011
Kan Shao; Yandong Liu; Daniel B. Neill
We present Generalized Fast Subset Sums (GFSS), a new Bayesian framework for scalable and accurate detection of irregularly shaped spatial clusters using multiple data streams. GFSS extends the previously proposed Multivariate Bayesian Scan Statistic (MBSS) and Fast Subset Sums (FSS) approaches for detection of emerging events. The detection power of MBSS is primarily limited by computational considerations, which limit it to searching over circular spatial regions. GFSS enables more accurate and timely detection by defining a hierarchical prior over all subsets of the N locations, first selecting a local neighborhood consisting of a center location and its neighbors, and introducing a sparsity parameter p to describe how likely each location in the neighborhood is to be affected. This approach allows us to consider all possible subsets of locations (including irregularly-shaped regions) but also puts higher weight on more compact regions. We demonstrate that MBSS and FSS are both special cases of this general framework (assuming p = 1 and p = 0.5 respectively), but substantially higher detection power can be achieved by choosing an appropriate value of p. Thus we show that the distribution of the sparsity parameter p can be accurately learned from a small number of labeled events. Our evaluation results (on synthetic disease outbreaks injected into real-world hospital data) show that the GFSS method with learned sparsity parameter has higher detection power and spatial accuracy than MBSS and FSS, particularly when the affected region is irregular or elongated. We also show that the learned models can be used for event characterization, accurately distinguishing between two otherwise identical event types based on the sparsity of the affected spatial region.
adversarial information retrieval on the web | 2008
Jiang Bian; Yandong Liu; Eugene Agichtein; Hongyuan Zha