Robert J. Gaizauskas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert J. Gaizauskas is active.

Explore More

Publication

Featured researches published by Robert J. Gaizauskas.

conference on applied natural language processing | 1997

GATE - a General Architecture for Text Engineering

Hamish Cunningham; Kevin Humphreys; Robert J. Gaizauskas; Yorick Wilks

This paper presents the design, implementation and evaluation of GATE, a General Architecture for Text Engineering.GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

Natural Language Engineering | 2001

Natural language question answering: the view from here

Lynette Hirschman; Robert J. Gaizauskas

As users struggle to navigate the wealth of on-line information now available, the need for automated question answering systems becomes more urgent. We need systems that allow a user to ask a question in everyday language and receive an answer quickly and succinctly, with sufficient context to validate the answer. Current search engines can return ranked lists of documents, but they do not deliver answers to the user.Question answering systems address this problem. Recent successes have been reported in a series of question-answering evaluations that started in 1999 as part of the Text Retrieval Conference (TREC). The best systems are now able to answer more than two thirds of factual questions in this evaluation.

meeting of the association for computational linguistics | 2007

SemEval-2007 Task 15: TempEval Temporal Relation Identification

Marc Verhagen; Robert J. Gaizauskas; Frank Schilder; Mark Hepple; Graham Katz; James Pustejovsky

The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. It avoids the pitfalls of evaluating a graph of inter-related labels by defining three sub tasks that allow pairwise evaluation of temporal relations. The task not only allows straightforward evaluation, it also avoids the complexities of full temporal parsing.

Journal of Documentation | 1998

Information Extraction: Beyond Document Retrieval

Robert J. Gaizauskas; Yorick Wilks

In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.

MUC6 '95 Proceedings of the 6th conference on Message understanding | 1995

University of Sheffield: description of the LaSIE system as used for MUC-6

Robert J. Gaizauskas; Kevin Humphreys; Hamish Cunningham; Yorick Wilks

The LaSIE (Large Scale Information Extraction) system has been developed at the University of Sheffield as part of an ongoing research effort into information extraction and, more generally, natural language engineering.

pacific symposium on biocomputing | 1999

Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Kevin Humphreys; George Demetriou; Robert J. Gaizauskas

Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from scientific journal papers in the area of molecular biology. In particular, we describe how an information extraction system designed to participate in the MUC exercises has been modified for two bioinformatics applications: EMPathIE, concerned with enzyme and metabolic pathways; and PASTA, concerned with protein structure. Progress to date provides convincing grounds for believing that IE techniques will deliver novel and effective ways for scientists to make use of the core literature which defines their disciplines.

Bioinformatics | 2003

Protein Structures and Information Extraction from Biological Texts: The PASTA System

Robert J. Gaizauskas; George Demetriou; Peter J. Artymiuk; Peter Willett

MOTIVATION The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. RESULTS We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date. AVAILABILITY PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.

meeting of the association for computational linguistics | 2002

Measuring Text Reuse

Paul D. Clough; Robert J. Gaizauskas; Scott Piao; Yorick Wilks

In this paper we present results from the METER (MEasuring TExt Reuse) project whose aim is to explore issues pertaining to text reuse and derivation, especially in the context of newspapers using newswire sources. Although the reuse of text by journalists has been studied in linguistics, we are not aware of any investigation using existing computational methods for this particular task. We investigate the classification of newspaper articles according to their degree of dependence upon, or derivation from, a newswire source using a simple 3-level scheme designed by journalists. Three approaches to measuring text similarity are considered: n-gram overlap, Greedy String Tiling, and sentence alignment. Measured against a manually annotated corpus of source and derived news text, we show that a combined classifier with features automatically selected performs best overall for the ternary classification achieving an average F1-measure score of 0.664 across all three categories.

Journal of Biomedical Informatics | 2009

Building a semantically annotated corpus of clinical texts

Angus Roberts; Robert J. Gaizauskas; Mark Hepple; George Demetriou; Yikun Guo; Ian Roberts; Andrea Setzer

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.

automated software engineering | 2003

CM-Builder: A Natural Language-Based CASE Tool for Object-Oriented Analysis

H. M. Harmain; Robert J. Gaizauskas

Graphical CASE (Computer Aided Software Engineering) tools provide considerable help in documenting the output of the Analysis and Design stages of software development and can assist in detecting incompleteness and inconsistency in an analysis. However, these tools do not contribute to the initial, difficult stage of the analysis process, that of identifying the object classes, attributes and relationships used to model the problem domain. This paper describes an NL-Based CASE tool called Class Model Builder (CM-Builder) which aims at supporting this aspect of the Analysis stage of software development in an Object-Oriented framework. CM-Builder uses robust Natural Language Processing techniques to analyse software requirements texts written in English and constructs, either automatically or interactively with an analyst, an initial UML Class Model representing the object classes mentioned in the text and the relationships among them. The initial model can be directly input to a graphical CASE tool for further refinement by a human analyst. CM-Builder has been quantitatively evaluated in blind trials against a collection of unseen software requirements texts and we present the results of this evaluation, together with the evaluation method. The results are very encouraging and demonstrate that tools such as CM-Builder have the potential to play an important role in the software development process.

Explore More