Is this you? Create Your Porfile

Huan Sun

University of California, Santa Barbara

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huan Sun is active.

Explore More

Publication

Featured researches published by Huan Sun.

IEEE Transactions on Knowledge and Data Engineering | 2014

Interpreting the Public Sentiment Variations on Twitter

Shulong Tan; Yang Li; Huan Sun; Ziyu Guan; Xifeng Yan; Jiajun Bu; Chun Chen; Xiaofei He

Millions of users share their opinions on Twitter, making it a valuable platform for tracking and analyzing public sentiment. Such tracking and analysis can provide critical information for decision making in various domains. Therefore it has attracted attention in both academia and industry. Previous research mainly focused on modeling and tracking public sentiment. In this work, we move one step further to interpret sentiment variations. We observed that emerging topics (named foreground topics) within the sentiment variation periods are highly related to the genuine reasons behind the variations. Based on this observation, we propose a Latent Dirichlet Allocation (LDA) based model, Foreground and Background LDA (FB-LDA), to distill foreground topics and filter out longstanding background topics. These foreground topics can give potential interpretations of the sentiment variations. To further enhance the readability of the mined reasons, we select the most representative tweets for foreground topics and develop another generative model called Reason Candidate and Background LDA (RCB-LDA) to rank them with respect to their “popularity” within the variation period. Experimental results show that our methods can effectively find foreground topics and rank reason candidates. The proposed models can also be applied to other tasks such as finding topic differences between two sets of documents.

international world wide web conferences | 2015

Open Domain Question Answering via Semantic Enrichment

Huan Sun; Hao Ma; Wen-tau Yih; Chen-Tse Tsai; Jingjing Liu; Ming-Wei Chang

Most recent question answering (QA) systems query large-scale knowledge bases (KBs) to answer a question, after parsing and transforming natural language questions to KBs-executable forms (e.g., logical forms). As a well-known fact, KBs are far from complete, so that information required to answer questions may not always exist in KBs. In this paper, we develop a new QA system that mines answers directly from the Web, and meanwhile employs KBs as a significant auxiliary to further boost the QA performance. Specifically, to the best of our knowledge, we make the first attempt to link answer candidates to entities in Freebase, during answer candidate generation. Several remarkable advantages follow: (1) Redundancy among answer candidates is automatically reduced. (2) The types of an answer candidate can be effortlessly determined by those of its corresponding entity in Freebase. (3) Capitalizing on the rich information about entities in Freebase, we can develop semantic features for each answer candidate after linking them to Freebase. Particularly, we construct answer-type related features with two novel probabilistic models, which directly evaluate the appropriateness of an answer candidates types under a given question. Overall, such semantic features turn out to play significant roles in determining the true answers from the large answer candidate pool. The experimental results show that across two testing datasets, our QA system achieves an 18%~54% improvement under F_1 metric, compared with various existing QA systems.

international world wide web conferences | 2016

Table Cell Search for Question Answering

Huan Sun; Hao Ma; Xiaodong He; Wen-tau Yih; Yu Su; Xifeng Yan

Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.

knowledge discovery and data mining | 2015

Exploiting Relevance Feedback in Knowledge Graph Search

Yu Su; Shengqi Yang; Huan Sun; Mudhakar Srivatsa; Sue E. Kase; Michelle Vanni; Xifeng Yan

The big data era is witnessing a prevalent shift of data from homogeneous to heterogeneous, from isolated to linked. Exemplar outcomes of this shift are a wide range of graph data such as information, social, and knowledge graphs. The unique characteristics of graph data are challenging traditional search techniques like SQL and keyword search. Graph query is emerging as a promising complementary search form. In this paper, we study how to improve graph query by relevance feedback. Specifically, we focus on knowledge graph query, and formulate the graph relevance feedback (GRF) problem. We propose a general GRF framework that is able to (1) tune the original ranking function based on user feedback and (2) further enrich the query itself by mining new features from user feedback. As a consequence, a query-specific ranking function is generated, which is better aligned with the user search intent. Given a newly learned ranking function based on user feedback, we further investigate whether we shall re-rank the existing answers, or choose to search from scratch. We propose a strategy to train a binary classifier to predict which action will be more beneficial for a given query. The GRF framework is applied to searching DBpedia with graph queries derived from YAGO and Wikipedia. Experiment results show that GRF can improve the mean average precision by 80% to 100%.

knowledge discovery and data mining | 2014

Analyzing expert behaviors in collaborative networks

Huan Sun; Mudhakar Srivatsa; Shulong Tan; Yang Li; Lance M. Kaplan; Shu Tao; Xifeng Yan

Collaborative networks are composed of experts who cooperate with each other to complete specific tasks, such as resolving problems reported by customers. A task is posted and subsequently routed in the network from an expert to another until being resolved. When an expert cannot solve a task, his routing decision (i.e., where to transfer a task) is critical since it can significantly affect the completion time of a task. In this work, we attempt to deduce the cognitive process of task routing, and model the decision making of experts as a generative process where a routing decision is made based on mixed routing patterns. In particular, we observe an interesting phenomenon that an expert tends to transfer a task to someone whose knowledge is neither too similar to nor too different from his own. Based on this observation, an expertise difference based routing pattern is developed. We formalize multiple routing patterns by taking into account both rational and random analysis of tasks, and present a generative model to combine them. For a held-out set of tasks, our model not only explains their real routing sequences very well, but also accurately predicts their completion time. Under three different quality measures, our method significantly outperforms all the alternatives with more than 75% accuracy gain. In practice, with the help of our model, hypotheses on how to improve a collaborative network can be tested quickly and reliably, thereby significantly easing performance improvement of collaborative networks.

empirical methods in natural language processing | 2016

On Generating Characteristic-rich Question Sets for QA Evaluation

Yu Su; Huan Sun; Brian M. Sadler; Mudhakar Srivatsa; Izzeddin Gur; Zenghui Yan; Xifeng Yan

We present a semi-automated framework for constructing factoid question answering (QA) datasets, where an array of question characteristics are formalized, including structure complexity, function, commonness, answer cardinality, and paraphrasing. Instead of collecting questions and manually characterizing them, we employ a reverse procedure, first generating a kind of graph-structured logical forms from a knowledge base, and then converting them into questions. Our work is the first to generate questions with explicitly specified characteristics for QA evaluation. We construct a new QA dataset with over 5,000 logical form-question pairs, associated with answers from the knowledge base, and show that datasets constructed in this way enable finegrained analyses of QA systems. The dataset can be found in https://github.com/ ysu1989/GraphQuestions.

international world wide web conferences | 2013

Synthetic review spamming and defense

Alex Morales; Huan Sun; Xifeng Yan

Online reviews are widely adopted in many websites such as Amazon, Yelp, and TripAdvisor. Positive reviews can bring significant financial gains, while negative ones often cause sales loss. This fact, unfortunately, results in strong incentives for opinion spam to mislead readers. Instead of hiring humans to write deceptive reviews, in this work, we bring into attention an automated, low-cost process for generating fake reviews, variations of which could be easily employed by evil attackers in reality. To the best of our knowledge, we are the first to expose the potential risk of machine-generated deceptive reviews. Our simple review synthesis model uses one truthful review as a template, and replaces its sentences with those from other reviews in a repository. The fake reviews generated by this mechanism are extremely hard to detect: Both the state-of-the-art machine detectors and human readers have an error rate of 35%-48%. A novel defense method that leverages the difference of semantic flows between fake and truthful reviews is developed, reducing the detection error rate to approximately 22%. Nevertheless, it is still a challenging research task to further decrease the error rate.

international world wide web conferences | 2018

StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow

Ziyu Yao; Daniel S. Weld; Wei-Peng Chen; Huan Sun

Stack Overflow (SO) has been a great source of natural language questions and their code solutions (i.e., question-code pairs), which are critical for many tasks including code retrieval and annotation. In most existing research, question-code pairs were collected heuristically and tend to have low quality. In this paper, we investigate a new problem of systematically mining question-code pairs from Stack Overflow (in contrast to heuristically collecting them). It is formulated as predicting whether or not a code snippet is a standalone solution to a question. We propose a novel Bi-View Hierarchical Neural Network which can capture both the programming content and the textual context of a code snippet (i.e., two views) to make a prediction. On two manually annotated datasets in Python and SQL domain, our framework substantially outperforms heuristic methods with at least 15% higher F1 and accuracy. Furthermore, we present StaQC (Stack Overflow Question-Code pairs), the largest dataset to date of ~148K Python and ~120K SQL question-code pairs, automatically mined from SO using our framework. Under various case studies, we demonstrate that StaQC can greatly help develop data-hungry models for associating natural language with programming language

siam international conference on data mining | 2016

Distributed Representations of Expertise.

Fangqiu Han; Shulong Tan; Huan Sun; Mudhakar Srivatsa; Deng Cai; Xifeng Yan

Collaborative networks are common in real life, where domain experts work together to solve tasks issued by customers. How to model the proficiency of experts is critical for us to understand and optimize collaborative networks. Traditional expertise models, such as topic model based methods, cannot capture two aspects of human expertise simultaneously: Specialization (what area an expert is good at?) and Proficiency Level (to what degree?). In this paper, we propose new models to overcome this problem. We embed all historical task data in a lower dimension space and learn vector representations of expertise based on both solved and unsolved tasks. Specifically, in our first model, we assume that each expert will only handle tasks whose difficulty level just matches his/her proficiency level, while experts in the second model accept tasks whose levels are equal to or lower than his/her proficiency level. Experiments on real world datasets show that both models outperform topic model based approaches and standard classifiers such as logistic regression and support vector machine in terms of prediction accuracy. The learnt vector representations can be used to compare expertise in a large organization and optimize expert allocation.

knowledge discovery and data mining | 2014

Network mining and analysis for social applications

Feida Zhu; Huan Sun; Xifeng Yan

The recent blossom of social network and communication services in both public and corporate settings have generated a staggering amount of network data of all kinds. Unlike the bio-networks and the chemical compound graph data often used in traditional network mining and analysis, the new network data grown out of the social applications are characterized by their rich attributes, high heterogeneity, enormous sizes and complex patterns of various semantic meanings, all of which have posed significant research challenges to the graph/network mining community. In this tutorial, we aim to examine some recent advances in network mining and analysis for social applications, covering a diverse collection of methodologies and applications from the perspectives of event, relationship, collaboration, and network pattern. We would present the problem settings, the challenges, the recent research advances and some future directions for each perspective. Topics include but are not limited to correlation mining, iceberg finding, anomaly detection, relationship discovery, information flow, task routing, and pattern mining.

Explore More