Robbie Haertel
Brigham Young University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Robbie Haertel.
linguistic annotation workshop | 2007
Eric K. Ringger; Peter McClanahan; Robbie Haertel; George Busby; Marc Carmen; James L. Carroll; Kevin D. Seppi; Deryle Lonsdale
In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and new variations of QBC and QBU, inspired by weaknesses particular to their use in this application. Experiments on English prose and poetry test these approaches and evaluate their robustness. The results allow us to make recommendations for both types of text and raise questions that will lead to further inquiry.
meeting of the association for computational linguistics | 2008
Robbie Haertel; Eric K. Ringger; Kevin D. Seppi; James Carroll; McClanahan Peter
Traditional Active Learning (AL) techniques assume that the annotation of each datum costs the same. This is not the case when annotating sequences; some sequences will take longer than others. We show that the AL technique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation user study, we approximate the amount of time necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% reduction in hours from a random baseline to achieve 96.5% tag accuracy on the Penn Treebank. More significantly, we make the case for measuring cost in assessing AL methods.
north american chapter of the association for computational linguistics | 2015
Paul Felt; Kevin Black; Eric K. Ringger; Kevin D. Seppi; Robbie Haertel
In modern practice, labeling a dataset often involves aggregating annotator judgments obtained from crowdsourcing. State-of-theart aggregation is performed via inference on probabilistic models, some of which are dataaware, meaning that they leverage features of the data (e.g., words in a document) in addition to annotator judgments. Previous work largely prefers discriminatively trained conditional models. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting. That is, the generative approach is better except when the annotators are highly accurate in which case simple majority vote is often sufficient. Additionally, we present a novel mean-field variational inference algorithm for the generative model that significantly improves on the previously reported state-of-the-art for that model. We validate our conclusions on six text classification datasets with both human-generated and synthetic annotations.
linguistic annotation workshop | 2015
Robbie Haertel; Eric K. Ringger; Kevin D. Seppi; Paul Felt
Return-on-Investment (ROI) is a costconscious approach to active learning (AL) that considers both estimates of cost and of benefit in active sample selection. We investigate the theoretical conditions for successful cost-conscious AL using ROI by examining the conditions under which ROI would optimize the area under the cost/benefit curve. We then empirically measure the degree to which optimality is jeopardized in practice when the conditions are violated. The reported experiments involve an English part-of-speech annotation task. Our results show that ROI can indeed successfully reduce total annotation costs and should be considered as a viable option for machine-assisted annotation. On the basis of our experiments, we make recommendations for benefit estimators to be employed in ROI. In particular, we find that the more linearly related a benefit estimate is to the true benefit, the better the estimate performs when paired in ROI with an imperfect cost estimate. Lastly, we apply our analysis to help explain the mixed results of previous work on these questions.
language resources and evaluation | 2008
Eric K. Ringger; Marc Carmen; Robbie Haertel; Kevin D. Seppi; Deryle Lonsdale; Peter McClanahan; James L. Carroll; Noel Ellison
north american chapter of the association for computational linguistics | 2010
Robbie Haertel; Paul Felt; Eric K. Ringger; Kevin D. Seppi
language resources and evaluation | 2010
Marc Carmen; Paul Felt; Robbie Haertel; Deryle Lonsdale; Peter McClanahan; Owen Merkling; Eric K. Ringger; Kevin D. Seppi
empirical methods in natural language processing | 2010
Peter McClanahan; George Busby; Robbie Haertel; Kristian Heal; Deryle Lonsdale; Kevin D. Seppi; Eric K. Ringger
language resources and evaluation | 2010
Paul Felt; Owen Merkling; Marc Carmen; Eric K. Ringger; Warren Lemmon; Kevin D. Seppi; Robbie Haertel
language resources and evaluation | 2014
Paul Felt; Eric K. Ringger; Kevin D. Seppi; Kristian Heal; Robbie Haertel; Deryle Lonsdale