Eric K. Ringger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric K. Ringger is active.

Explore More

Publication

Featured researches published by Eric K. Ringger.

linguistic annotation workshop | 2007

Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation

Eric K. Ringger; Peter McClanahan; Robbie Haertel; George Busby; Marc Carmen; James L. Carroll; Kevin D. Seppi; Deryle Lonsdale

In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and new variations of QBC and QBU, inspired by weaknesses particular to their use in this application. Experiments on English prose and poetry test these approaches and evaluate their robustness. The results allow us to make recommendations for both types of text and raise questions that will lead to further inquiry.

meeting of the association for computational linguistics | 2008

Assessing the Costs of Sampling Methods in Active Learning for Annotation

Robbie Haertel; Eric K. Ringger; Kevin D. Seppi; James Carroll; McClanahan Peter

Traditional Active Learning (AL) techniques assume that the annotation of each datum costs the same. This is not the case when annotating sequences; some sequences will take longer than others. We show that the AL technique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation user study, we approximate the amount of time necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% reduction in hours from a random baseline to achieve 96.5% tag accuracy on the Penn Treebank. More significantly, we make the case for measuring cost in assessing AL methods.

language and technology conference | 2006

Multilingual Dependency Parsing using Bayes Point Machines

Simon Corston-Oliver; Anthony Aue; Kevin Duh; Eric K. Ringger

We develop dependency parsers for Arabic, English, Chinese, and Czech using Bayes Point Machines, a training algorithm which is as easy to implement as the perceptron yet competitive with large margin methods. We achieve results comparable to state-of-the-art in English and Czech, and report the first directed dependency parsing accuracies for Arabic and Chinese. Given the multilingual nature of our experiments, we discuss some issues regarding the comparison of dependency parsers for different languages.

north american chapter of the association for computational linguistics | 2015

Is Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models

Thang Nguyen; Jordan L. Boyd-Graber; Jeffrey Lund; Kevin D. Seppi; Eric K. Ringger

Topic models provide insights into document collections, and their supervised extensions also capture associated document-level metadata such as sentiment. However, inferring such models from data is often slow and cannot scale to big data. We build upon the “anchor” method for learning topic models to capture the relationship between metadata and latent topics by extending the vector-space representation of word-cooccurrence to include metadataspecific dimensions. These additional dimensions reveal new anchor words that reflect specific combinations of metadata and topic. We show that these new latent representations predict sentiment as accurately as supervised topic models, and we find these representations more quickly without sacrificing interpretability.

north american chapter of the association for computational linguistics | 2015

Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

Paul Felt; Kevin Black; Eric K. Ringger; Kevin D. Seppi; Robbie Haertel

In modern practice, labeling a dataset often involves aggregating annotator judgments obtained from crowdsourcing. State-of-theart aggregation is performed via inference on probabilistic models, some of which are dataaware, meaning that they leverage features of the data (e.g., words in a document) in addition to annotator judgments. Previous work largely prefers discriminatively trained conditional models. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting. That is, the generative approach is better except when the annotators are highly accurate in which case simple majority vote is often sufficient. Additionally, we present a novel mean-field variational inference algorithm for the generative model that significantly improves on the previously reported state-of-the-art for that model. We validate our conclusions on six text classification datasets with both human-generated and synthetic annotations.

language resources and evaluation | 2014

Evaluating machine-assisted annotation in under-resourced settings

Paul Felt; Eric K. Ringger; Kevin D. Seppi; Kristian Heal; Robbie Haertel; Deryle Lonsdale

Machine assistance is vital to managing the cost of corpus annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important and challenging in under-resourced domains and highly heterogeneous corpora, as the quality of machine assistance varies. We perform a fine-grained evaluation of two machine-assistance techniques in the context of an under-resourced corpus annotation project. This evaluation requires a carefully controlled user study crafted to test a number of specific hypotheses. We show that human annotators performing morphological analysis of text in a Semitic language perform their task significantly more accurately and quickly when even mediocre pre-annotations are provided. When pre-annotations are at least 70xa0% accurate, annotator speed and accuracy show statistically significant relative improvements of 25–35xa0and 5–7xa0%, respectively. However, controlled user studies are too costly to be suitable for under-resourced corpus annotation projects. Thus, we also present an alternative analysis methodology that models the data as a combination of latent variables in a Bayesian framework. We show that modeling the effects of interesting confounding factors can generate useful insights. In particular, correction propagation appears to be most effective for our task when implemented with minimal user involvement. More importantly, by explicitly accounting for confounding variables, this approach has the potential to yield fine-grained evaluations using data collected in a natural environment outside of costly controlled user studies.

conference on computational natural language learning | 2015

Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA

Paul Felt; Eric K. Ringger; Jordan L. Boyd-Graber; Kevin D. Seppi

Corpus labeling projects frequently use low-cost workers from microtask marketplaces; however, these workers are often inexperienced or have misaligned incentives. Crowdsourcing models must be robust to the resulting systematic and nonsystematic inaccuracies. We introduce a novel crowdsourcing model that adapts the discrete supervised topic model sLDA to handle multiple corrupt, usually conflicting (hence “confused”) supervision signals. Our model achieves significant gains over previous work in the accuracy of deduced ground truth.

linguistic annotation workshop | 2015

An Analytic and Empirical Evaluation of Return-on-Investment-Based Active Learning

Robbie Haertel; Eric K. Ringger; Kevin D. Seppi; Paul Felt

Return-on-Investment (ROI) is a costconscious approach to active learning (AL) that considers both estimates of cost and of benefit in active sample selection. We investigate the theoretical conditions for successful cost-conscious AL using ROI by examining the conditions under which ROI would optimize the area under the cost/benefit curve. We then empirically measure the degree to which optimality is jeopardized in practice when the conditions are violated. The reported experiments involve an English part-of-speech annotation task. Our results show that ROI can indeed successfully reduce total annotation costs and should be considered as a viable option for machine-assisted annotation. On the basis of our experiments, we make recommendations for benefit estimators to be employed in ROI. In particular, we find that the more linearly related a benefit estimate is to the true benefit, the better the estimate performs when paired in ROI with an imperfect cost estimate. Lastly, we apply our analysis to help explain the mixed results of previous work on these questions.

language resources and evaluation | 2008