Ben Wellner | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Wellner is active.

Explore More

Publication

Featured researches published by Ben Wellner.

meeting of the association for computational linguistics | 2006

Machine Learning of Temporal Relations

Inderjeet Mani; Marc Verhagen; Ben Wellner; Chong Min Lee; James Pustejovsky

This paper investigates a machine learning approach for temporally ordering and anchoring events in natural language texts. To address data sparseness, we used temporal reasoning as an over-sampling method to dramatically expand the amount of training data, resulting in predictive accuracy on link labeling as high as 93% using a Maximum Entropy classifier on human annotated data. This method compared favorably against a series of increasingly sophisticated baselines involving expansion of rules derived from human intuitions.

annual meeting of the special interest group on discourse and dialogue | 2009

Classification of discourse coherence relations: an exploratory study using multiple knowledge sources

Ben Wellner; James Pustejovsky; Catherine Havasi; Anna Rumshisky; Roser Saurí

In this paper we consider the problem of identifying and classifying discourse coherence relations. We report initial results over the recently released Discourse GraphBank (Wolf and Gibson, 2005). Our approach considers, and determines the contributions of, a variety of syntactic and lexico-semantic features. We achieve 81% accuracy on the task of discourse relation type classification and 70% accuracy on relation identification.

language resources and evaluation | 2010

SpatialML: annotation scheme, resources, and evaluation

Inderjeet Mani; Christy Doran; Dave Harris; Janet Hitzeman; Rob Quimby; Justin Richer; Ben Wellner; Scott A. Mardis; Seamus Clancy

SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with several annotated corpora. Inter-annotator agreement on SpatialML extents is 91.3 F-measure on a corpus of SpatialML-annotated ACE documents released by the Linguistic Data Consortium. Disambiguation agreement on geo-coordinates on ACE is 87.93 F-measure. An automatic tagger for SpatialML extents scores 86.9 F on ACE, while a disambiguator scores 93.0 F on it. Results are also presented for two other corpora. In adapting the extent tagger to new domains, merging the training data from the ACE corpus with annotated data in the new domain provides the best performance.

web information and data management | 2007

Adaptive web-page content identification

John Gibson; Ben Wellner; Susan Lubar

Identifying which parts of a Web-page contain target content (e.g., the portion of an online news page that contains the actual article) is a significant problem that must be addressed for many Web-based applications. Most approaches to this problem involve crafting hand-tailored rules or scripts to extract the content, customized separately for particular Web sites. Besides requiring considerable time and effort to implement, hand-built extraction routines are brittle: they fail to properly extract content in some cases and break when the structure of a sites Web-pages changes. In this work we treat the problem of identifying content as a sequence labeling problem, a common problem structure in machine learning and natural language processing. Using a Conditional Random Field sequence labeling model, we correctly identify the content portion of web-pages anywhere from 80-97% of the time depending on experimental factors such as ensuring the absence of duplicate documents and application of the model against unseen sources.

Natural Language Engineering | 2006

Reading comprehension tests for computer-based understanding evaluation

Ben Wellner; Lisa Ferro; Warren R. Greiff; Lynette Hirschman

Reading comprehension (RC) tests involve reading a short passage of text and answering a series of questions pertaining to that text. We present a methodology for evaluation of the application of modern natural language technologies to the task of responding to RC tests. Our work is based on ABCs (Abduction Based Comprehension system), an automated system for taking tests requiring short answer phrases as responses. A central goal of ABCs is to serve as a testbed for understanding the role that various linguistic components play in responding to reading comprehension questions. The heart of ABCs is an abductive inference engine that provides three key capabilities: (1) first-order logical representation of relations between entities and events in the text and rules to perform inference over such relations, (2) graceful degradation due to the inclusion of abduction in the reasoning engine, which avoids the brittleness that can be problematic in knowledge representation and reasoning systems and (3) system transparency such that the types of abductive inferences made over an entire corpus provide cues as to where the system is performing poorly and indications as to where existing knowledge is inaccurate or new knowledge is required. ABCs, with certain sub-components not yet automated, finds the correct answer phrase nearly 35 percent of the time using a strict evaluation metric and 45 percent of the time using a looser inexact metric on held out evaluation data. Performance varied for the different question types, ranging from over 50 percent on who questions to over 10 percent on what questions. We present analysis of the roles of individual components and analysis of the impact of various characteristics of the abductive proof procedure on overall system performance.

intelligent systems in molecular biology | 2005

Weakly Supervised Learning Methods for Improving the Quality of Gene Name Normalization Data

Ben Wellner

A pervasive problem facing many biomedical text mining applications is that of correctly associating mentions of entities in the literature with corresponding concepts in a database or ontology. Attempts to build systems for automating this process have shown promise as demonstrated by the recent BioCreAtIvE Task 1B evaluation. A significant obstacle to improved performance for this task, however, is a lack of high quality training data. In this work, we explore methods for improving the quality of (noisy) Task 1B training data using variants of weakly supervised learning methods. We present positive results demonstrating that these methods result in an improvement in training data quality as measured by improved system performance over the same system using the originally labeled data.

data integration in the life sciences | 2012

Validating candidate gene-mutation relations in MEDLINE abstracts via crowdsourcing

John D. Burger; Emily Doughty; Samuel Bayer; David Tresner-Kirsch; Ben Wellner; John S. Aberdeen; Kyungjoon Lee; Maricel G. Kann; Lynette Hirschman

We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.

intelligent systems in molecular biology | 2005

Adaptive String Similarity Metrics for Biomedical Reference Resolution

Ben Wellner; José M. Castaño; James Pustejovsky

In this paper we present the evaluation of a set of string similarity metrics used to resolve the mapping from strings to concepts in the UMLS MetaThesaurus. String similarity is conceived as a single component in a full Reference Resolution System that would resolve such a mapping. Given this qualification, we obtain positive results achieving 73.6 F-measure (76.1 precision and 71.4 recall) for the task of assigning the correct UMLS concept to a given string. Our results demonstrate that adaptive string similarity methods based on Conditional Random Fields outperform standard metrics in this domain.

international conference on human language technology research | 2001

Integrated Feasibility Experiment for Bio-Security: IFE-Bio a TIDES demonstration

Lynette Hirschman; Kris Concepcion; Laurie E. Damianos; David S. Day; John Delmore; Lisa Ferro; John Griffith; John C. Henderson; Jeff Kurtz; Inderjeet Mani; Scott A. Mardis; Tom McEntee; Keith J. Miller; Beverly Nunan; Jay M. Ponte; Florence Reeder; Ben Wellner; George Wilson; Alex Yeh

As part of MITREs work under the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) program, we are preparing a series of demonstrations to showcase the TIDES Integrated Feasibility Experiment on Bio-Security (IFE-Bio). The current demonstration illustrates some of the resources that can be made available to analysts tasked with monitoring infectious disease outbreaks and other biological threats.

conference on computational natural language learning | 2009

A simple feature-copying approach for long-distance dependencies

Marc B. Vilain; Jonathan Huggins; Ben Wellner

This paper is concerned with statistical methods for treating long-distance dependencies. We focus in particular on a case of substantial recent interest: that of long-distance dependency effects in entity extraction. We introduce a new approach to capturing these effects through a simple feature copying preprocess, and demonstrate substantial performance gains on several entity extraction tasks.

Explore More