Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Francois Huynh is active.

Publication


Featured researches published by David Francois Huynh.


linguistic annotation workshop | 2015

Scaling Semantic Frame Annotation

Nancy Chang; Praveen Paritosh; David Francois Huynh; Collin F. Baker

Large-scale data resources needed for progress toward natural language understanding are not yet widely available and typically require considerable expense and expertise to create. This paper addresses the problem of developing scalable approaches to annotating semantic frames and explores the viability of crowdsourcing for the task of frame disambiguation. We present a novel supervised crowdsourcing paradigm that incorporates insights from human computation research designed to accommodate the relative complexity of the task, such as exemplars and real-time feedback. We show that non-experts can be trained to perform accurate frame disambiguation, and can even identify errors in gold data used as the training exemplars. Results demonstrate the efficacy of this paradigm for semantic annotation requiring an intermediate level of expertise. 1 The semantic bottleneck Behind every great success in speech and language lies a great corpus—or at least a very large one. Advances in speech recognition, machine translation and syntactic parsing can be traced to the availability of large-scale annotated resources (Wall Street Journal, Europarl and Penn Treebank, respectively) providing crucial supervised input to statistically learned models. Semantically annotated resources have been comparatively harder to come by: representing meaning poses myriad philosophical, theoretical and practical challenges, particularly for general purpose resources that can be applied to diverse domains. If these challenges can be addressed, however, semantic resources hold significant potential for fueling progress beyond shallow syntax and toward deeper language understanding. This paper explores the feasibility of developing scalable methodologies for semantic annotation, inspired by three strands of work. First, frame semantics, and its instantiation in the Berkeley FrameNet project (Fillmore and Baker, 2010), offers a principled approach to representing meaning. FrameNet is a lexicographic resource that captures syntactic and semantic generalizations that go beyond surface form and part of speech, famously including the relationships among words like buy, sell, purchase and price. These rich structural relations provide an attractive foundation for work in deeper natural language understanding and inference, as attested by the breadth of applications at the Workshop in Honor of Chuck Fillmore at ACL 2014 (Petruck and de Melo, 2014). But FrameNet was not designed to support scalable language technologies; indeed, it is perhaps a paradigm example of a hand-curated knowledge resource, one that has required significant expertise, training, time and expense to create and that remains under development. Second, the task of automatic semantic role labeling (ASRL) (Gildea and Jurafsky, 2002) serves as an applied counterpart to the ideas of frame semantics. Recent progress has demonstrated the viability of training automated models using frameannotated data (Das et al., 2013; Das et al., 2010; Johansson and Nugues, 2006). Results based on FrameNet data have been limited by its incomplete


ACM Transactions on Intelligent Systems and Technology | 2016

Crowdsourcing Human Annotation on Web Page Structure: Infrastructure Design and Behavior-Based Quality Control

Shuguang Han; Peng Dai; Praveen Paritosh; David Francois Huynh

Parsing the semantic structure of a web page is a key component of web information extraction. Successful extraction algorithms usually require large-scale training and evaluation datasets, which are difficult to acquire. Recently, crowdsourcing has proven to be an effective method of collecting large-scale training data in domains that do not require much domain knowledge. For more complex domains, researchers have proposed sophisticated quality control mechanisms to replicate tasks in parallel or sequential ways and then aggregate responses from multiple workers. Conventional annotation integration methods often put more trust in the workers with high historical performance; thus, they are called performance-based methods. Recently, Rzeszotarski and Kittur have demonstrated that behavioral features are also highly correlated with annotation quality in several crowdsourcing applications. In this article, we present a new crowdsourcing system, called Wernicke, to provide annotations for web information extraction. Wernicke collects a wide set of behavioral features and, based on these features, predicts annotation quality for a challenging task domain: annotating web page structure. We evaluate the effectiveness of quality control using behavioral features through a case study where 32 workers annotate 200 Q&A web pages from five popular websites. In doing so, we discover several things: (1) Many behavioral features are significant predictors for crowdsourcing quality. (2) The behavioral-feature-based method outperforms performance-based methods in recall prediction, while performing equally with precision prediction. In addition, using behavioral features is less vulnerable to the cold-start problem, and the corresponding prediction model is more generalizable for predicting recall than precision for cross-website quality analysis. (3) One can effectively combine workers’ behavioral information and historical performance information to further reduce prediction errors.


Archive | 2012

Search Result Ranking and Presentation

Chen Zhou; Chen Ding; David Francois Huynh; Jinyu Lou; Yanlai Huang; Hongda Shen; Guanghua Li; Yiming Li; Yangyang Chai


Archive | 2013

Generating insightful connections between graph entities

David Francois Huynh; Guanghua Li; Chen Ding; Yanlai Huang; Ying Chai; Liang Hu; Jingxu Chen


Archive | 2012

Ranking search results based on entity metrics

Hongda Shen; David Francois Huynh; Grace Chung; Chen Zhou; Yanlai Huang; Guanghua Li


Archive | 2012

CLUSTERED SEARCH RESULTS

David Francois Huynh; Qianhao Qiu


Archive | 2012

PROVIDING SEARCH RESULTS BASED ON A COMPOSITIONAL QUERY

Jinyu Lou; Ying Chai; Chen Ding; Lijie Chen; Liang Hu; Kejia Liu; Weibin Pan; Yanlai Huang; David Francois Huynh


Archive | 2013

NATURAL LANGUAGE PROCESSING BASED SEARCH

Guanghua Li; David Francois Huynh; Yanlai Huang; Yuan Gao; Ying Chai; Manish Rai Jain; Yong Zhang


Archive | 2012

Related Entity Search

Ying Chai; David Francois Huynh; Jinyu Lou; Chen Ding


language resources and evaluation | 2014

A Database for Measuring Linguistic Information Content

Richard Sproat; Bruno Cartoni; HyunJeong Choe; David Francois Huynh; Linne Ha; Ravindran Rajakumar; Evelyn Wenzel-Grondie

Collaboration


Dive into the David Francois Huynh's collaboration.

Researchain Logo
Decentralizing Knowledge