Lance A. Ramshaw
BBN Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lance A. Ramshaw.
meeting of the association for computational linguistics | 1999
Lance A. Ramshaw; Mitchell P. Marcus
Transformation-based learning, a technique introduced by Eric Brill (1993b), has been shown to do part-of-speech tagging with fairly high accuracy. This same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 93% for baseNP chunks (trained on 950K words) and 88% for somewhat more complex chunks that partition the sentence (trained on 200K words). Working in this new application and with larger template and training sets has also required some interesting adaptations to the transformation-based learning approach.
north american chapter of the association for computational linguistics | 2006
Eduard H. Hovy; Mitchell P. Marcus; Martha Palmer; Lance A. Ramshaw; Ralph M. Weischedel
We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007.
meeting of the association for computational linguistics | 1999
Michael Collins; Jan Hajic; Lance A. Ramshaw; Christoph Tillmann
This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly inflected language, and (2) it has relatively free word order. These differences are likely to pose new problems for techniques that have been developed on English. We describe our experience in building on the parsing model of (Collins 97). Our final results- 80% dependency accuracy - represent good progress towards the 91% accuracy of the parser on English (Wall Street Journal) text.
international conference on semantic computing | 2007
a.S. Pradhan; Eduard H. Hovy; Mitch Marcus; Martha Palmer; Lance A. Ramshaw; Ralph M. Weischedel
The OntoNotes project is creating a corpus of large-scale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data.Extraction of quantitative information about spatio- temporal events happening in cells is the key to understanding biological processes. In this paper we present a finite state machine (FSM)-based model for specification and identification of spatio-temporal events at the single-cell level. Cells are modeled as objects with specific attributes such as color, size, shape, etc., and events are modeled in terms of the specific values of attributes of participating objects along with the spatial relationships between these objects. Results for a time-lapse apoptosis screen are presented where the extra information provided by per cell-based analysis is used to compensate for experimental artifacts. The model is general and is applicable to other cell-based studies also.
international conference on semantic computing | 2007
Sameer Pradhan; Lance A. Ramshaw; Ralph M. Weischedel; Jessica MacBride; Linnea Micciulla
Most research in the field of anaphora or coreference detection has been limited to noun phrase coreference, usually on a restricted set of entities, such as ACE entities. In part, this has been due to the lack of corpus resources tagged with general anaphoric coreference. The OntoNotes project is creating a large-scale, accurate corpus for general anaphoric coreference that covers entities and events not limited to noun phrases or a limited set of entity types. The coreference layer in OntoNotes constitutes one part of a multi-layer, integrated annotation of shallow semantic structure in text. This paper presents an initial model for unrestricted coreference based on this data that uses a machine learning architecture with state-of-the-art features. Significant improvements can be expected from using such cross-layer information for training predictive models. This paper describes the coreference annotation in OntoNotes, presents the baseline model, and provides an analysis of the contribution of this new resource in the context of recent MUC and ACE results.
meeting of the association for computational linguistics | 1991
Lance A. Ramshaw
In modeling the structure of task-related discourse using plans, it is important to distinguish between plans that the agent has adopted and is pursuing and those that are only being considered and explored, since the kinds of utterances arising from a particular domain plan and the patterns of reference to domain plans and movement within the plan tree are quite different in the two cases. This paper presents a three-level discourse model that uses separate domain and exploration layers, in addition to a layer of discourse metaplans, allowing these distinct behavior patterns and the plan adoption and reconsideration moves they imply to be recognized and modeled.
international conference on human language technology research | 2001
Lance A. Ramshaw; Elizabeth Boschee; Sergey Bratus; Scott Miller; Rebecca Stone; Ralph M. Weischedel; Alex Zamanian
Unlike earlier information extraction research programs, the new ACE (Automatic Content Extraction) program calls for entity extraction by identifying and linking all of the mentions of an entity in the source text, including names, descriptions, and pronouns. Coreference is therefore a key component. BBN has developed statistical co-reference models for this task, including one for pronoun co-reference that we describe here in some detail. In addition, ACE calls for extraction not just from clean text, but also from noisy speech and OCR input. Since speech recognizer output includes neither case nor punctuation, we have extended our statistical parser to perform sentence breaking integrated with parsing in a probabilistic model.
International Journal of Semantic Computing | 2007
Sameer Pradhan; Eduard H. Hovy; Mitchell P. Marcus; Martha Palmer; Lance A. Ramshaw; Ralph M. Weischedel
The OntoNotes project is creating a corpus of large-scale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data.
human language technology | 1989
Ralph M. Weischedel; Robert J. Bobrow; Damaris M. Ayuso; Lance A. Ramshaw
Although natural language technology has achieved a high degree of domain independence through separating domain-independent modules from domain-dependent knowledge bases, portability, as measured by effort to move from one application to another, is still a problem. Here we describe a knowledge acquisition tool (KNACQ) that has sharply decreased our effort in building knowledge bases. The knowledge bases acquired with KNACQ are used by both the understanding components and the generation components of Janus.
Proceedings of the TIPSTER Text Program: Phase III | 1998
Scott Miller; Michael Crystal; Heidi Fox; Lance A. Ramshaw; Richard M. Schwartz; Rebecca Stone; Ralph M. Weischedel
All of BBNs research under the TIPSTER III program has focused on doing extraction by applying statistical models trained on annotated data, rather than by using programs that execute hand-written rules. Within the context of MUC-7, the SIFT system for extraction of template entities (TE) and template relations (TR) used a novel, integrated syntactic/semantic language model to extract sentence level information, and then synthesized information across sentences using in part a trained model for cross-sentence relations. At the named entity (NE) level as well, in both MET-1 and MUC-7, BBN employed a trained, HMM-based model.The results in these TIPSTER evaluations are evidence that such trained systems, even at their current level of development, can perform roughly on a par with those based on rules hand-tailored by experts. In addition, such trained systems have some significant advantages:• They can be easily ported to new domains by simply annotating fresh data.• The complex interactions that make rule-based systems difficult to develop and maintain can here be learned automatically from the training data.We believe that improved and extended versions of such trained models have the potential for significant further progress toward practical systems for information extraction.