Yu Su
University of California, Santa Barbara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu Su.
conference on information and knowledge management | 2017
Yu Su; Ahmed Hassan Awadallah; Madian Khabsa; Patrick Pantel; Michael Gamon; Mark J. Encarnacion
As the Web evolves towards a service-oriented architecture, application program interfaces (APIs) are becoming an increasingly important way to provide access to data, services, and devices. We study the problem of natural language interface to APIs (NL2APIs), with a focus on web APIs for web services. Such NL2APIs have many potential benefits, for example, facilitating the integration of web services into virtual assistants. We propose the first end-to-end framework to build an NL2API for a given web API. A key challenge is to collect training data, i.e., NL command-API call pairs, from which an NL2API can learn the semantic mapping from ambiguous, informal NL commands to formal API calls. We propose a novel approach to collect training data for NL2API via crowdsourcing, where crowd workers are employed to generate diversified NL commands. We optimize the crowdsourcing process to further reduce the cost. More specifically, we propose a novel hierarchical probabilistic model for the crowdsourcing process, which guides us to allocate budget to those API calls that have a high value for training NL2APIs. We apply our framework to real-world APIs, and show that it can collect high-quality training data at a low cost, and build NL2APIs with good performance from scratch. We also show that our modeling of the crowdsourcing process can improve its effectiveness, such that the training data collected via our approach leads to better performance of NL2APIs than a strong baseline.
web search and data mining | 2018
Xiang Ren; Craig A. Knoblock; William Yang Wang; Yu Su
1. Motivation and Goals. x8ce success of data mining and search technologies is largely ax8aributed to the ex81cient and ex82ective analysis of structured data. x8ce construction of a well-structured, machine-actionable database from raw data sources is ox89en the premise of consequent applications. Meanwhile, the ability of mining and reasoning over such constructed databases is at the core of powering various downstream applications on web and mobile devices. Recently, we have witnessed a signix80cant amount of interests in building large-scale knowledge bases (KBs) from massive, unstructured data sources (e.g., Wikipedia-based methods such as DBpedia [9], YAGO [19], Wikidata [22], automated systems like Snowball [1], KnowItAll [5], NELL [4] and DeepDive [15], and opendomain approaches like Open IE [2] and Universal Schema [14]); as well as mining and reasoning over such knowledge bases to empower a wide variety of intelligent services, including question answering [6], recommender systems [3] and semantic search [8]. Automated construction, mining and reasoning of the knowledge bases have become possible as research advances in many related areas such as information extraction, natural language processing, data mining, search, machine learning, databases and data integration. However, there are still substantial scientix80c and engineering challenges in advancing and integrating such relevant methodologies. x8ce goal of this proposed workshop is to gather together leading experts from industry and academia to share their visions about the x80eld, discuss latest research results, and exchange exciting ideas. With a focus on invited talks and position papers, the workshop aims to provide a vivid forum of discussion about knowledge base-related research. 2. Relevance to WSDM. Knowledge base construction, mining and reasoning is closely related to a wide variety of applications in WSDM, including web search, question answering, and recommender systems. Building a high-quality knowledge base from
north american chapter of the association for computational linguistics | 2018
Yu Su; Honglei Liu; Semih Yavuz; Izzeddin Gur; Huan Sun; Xifeng Yan
We study the problem of textual relation embedding with distant supervision. To combat the wrong labeling problem of distant supervision, we propose to embed textual relations with global statistics of relations, i.e., the co-occurrence statistics of textual and knowledge base relations collected from the entire corpus. This approach turns out to be more robust to the training noise introduced by distant supervision. On a popular relation extraction dataset, we show that the learned textual relation embedding can be used to augment existing relation extraction models and significantly improve their performance. Most remarkably, for the top 1,000 relational facts discovered by the best existing model, the precision can be improved from 83.9% to 89.3%.
international acm sigir conference on research and development in information retrieval | 2018
Yu Su; Ahmed Hassan Awadallah; Miaosen Wang; Ryen W. White
The rapidly increasing ubiquity of computing puts a great demand on next-generation human-machine interfaces. Natural language interfaces, exemplified by virtual assistants like Apple Siri and Microsoft Cortana, are widely believed to be a promising direction. However, current natural language interfaces provide users with little help in case of incorrect interpretation of user commands. We hypothesize that the support of fine-grained user interaction can greatly improve the usability of natural language interfaces. In the specific setting of natural language interface to web APIs, we conduct a systematic study to verify our hypothesis. To facilitate this study, we propose a novel modular sequence-to-sequence model to create interactive natural language interfaces. By decomposing the complex prediction process of a typical sequence-to-sequence model into small, highly-specialized prediction units called modules, it becomes straightforward to explain the model prediction to the user, and solicit user feedback to correct possible prediction errors at a fine-grained level. We test our hypothesis by comparing an interactive natural language interface with its non-interactive version through both simulation and human subject experiments with real-world APIs. We show that with the interactive natural language interface, users can achieve a higher success rate and a lower task completion time, which lead to greatly improved user satisfaction.
empirical methods in natural language processing | 2017
Semih Yavuz; Izzeddin Gur; Yu Su; Xifeng Yan
The existing factoid QA systems often lack a post-inspection component that can help models recover from their own mistakes. In this work, we propose to crosscheck the corresponding KB relations behind the predicted answers and identify potential inconsistencies. Instead of developing a new model that accepts evidences collected from these relations, we choose to plug them back to the original questions directly and check if the revised question makes sense or not. A bidirectional LSTM is applied to encode revised questions. We develop a scoring mechanism over the revised question encodings to refine the predictions of a base QA system. This approach can improve the F1 score of STAGG (Yih et al., 2015), one of the leading QA systems, from 52.5% to 53.9% on WEBQUESTIONS data.
empirical methods in natural language processing | 2017
Jie Zhao; Yu Su; Ziyu Guan; Huan Sun
siam international conference on data mining | 2018
Keqian Li; Hanwen Zha; Yu Su; Xifeng Yan
meeting of the association for computational linguistics | 2018
Izzeddin Gur; Semih Yavuz; Yu Su; Xifeng Yan
empirical methods in natural language processing | 2018
Wenhu Chen; Jianshu Chen; Yu Su; Xin Wang; Dong Yu; Xifeng Yan; William Yang Wang
empirical methods in natural language processing | 2018
Semih Yavuz; Izzeddin Gur; Yu Su; Xifeng Yan