Yorick Wilks | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yorick Wilks is active.

Explore More

Publication

Featured researches published by Yorick Wilks.

conference on applied natural language processing | 1997

GATE - a General Architecture for Text Engineering

Hamish Cunningham; Kevin Humphreys; Robert J. Gaizauskas; Yorick Wilks

This paper presents the design, implementation and evaluation of GATE, a General Architecture for Text Engineering.GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

Artificial Intelligence | 1975

A Preferential, Pattern-Seeking, Semantics for Natural Language Inference

Yorick Wilks

Abstract The paper describes the way in which a Preference Semantics system for natural language analysis and generation tackles a difficult class of anaphoric inference problems: those requiring either analytic (conceptual) knowledge of a complex sort, or requiring weak inductive knowledge of the course of events in the real world. The method employed converts all available knowledge to a canonical template form and endeavors to create chains of non-reductive inferences from the unknowns to the possible referents. Its method for this is consistent with the overall principle of “semantic preference” used to set up the original meaning representation.

Machine Translation | 1990

Providing Machine Tractable Dictionary Tools

Yorick Wilks; Dan Fass; Cheng-ming Guo; James E. McDonald; Tony Plate; Brian M. Slator

Machine readable dictionaries (Mrds) contain knowledge about language and the world essential for tasks in natural language processing (Nlp). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for Mrds to be used directly for Nlp tasks. What is badly needed are machine tractable dictionaries (Mtds): Mrds transformed into a format usable for Nlp. This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds. The Mrd used is The Longman Dictionary of Contemporary English (Ldoce). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates “relatedness” networks for words in Ldoce and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an Mtd containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within Ldoce. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base.

Journal of Documentation | 1998

Information Extraction: Beyond Document Retrieval

Robert J. Gaizauskas; Yorick Wilks

In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.

MUC6 '95 Proceedings of the 6th conference on Message understanding | 1995

University of Sheffield: description of the LaSIE system as used for MUC-6

Robert J. Gaizauskas; Kevin Humphreys; Hamish Cunningham; Yorick Wilks

The LaSIE (Large Scale Information Extraction) system has been developed at the University of Sheffield as part of an ongoing research effort into information extraction and, more generally, natural language engineering.

Computational Linguistics | 2001

The interaction of knowledge sources in word sense disambiguation

Mark Stevenson; Yorick Wilks

Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in articial intelligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems.

knowledge acquisition, modeling and management | 2002

User-System Cooperation in Document Annotation Based on Information Extraction

Fabio Ciravegna; Alexiei Dingli; Daniela Petrelli; Yorick Wilks

The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and there is still the necessity of thinking the impact of the IE system on the whole annotation process. In this paper we initially discuss a number of requirements for the use of IE as support for annotation. Then we present and discuss a model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE. Finally we present an experiment that quantifies the gain in using IE as support to human annotators.

meeting of the association for computational linguistics | 2002

Measuring Text Reuse

Paul D. Clough; Robert J. Gaizauskas; Scott Piao; Yorick Wilks

In this paper we present results from the METER (MEasuring TExt Reuse) project whose aim is to explore issues pertaining to text reuse and derivation, especially in the context of newspapers using newswire sources. Although the reuse of text by journalists has been studied in linguistics, we are not aware of any investigation using existing computational methods for this particular task. We investigate the classification of newspaper articles according to their degree of dependence upon, or derivation from, a newswire source using a simple 3-level scheme designed by journalists. Three approaches to measuring text similarity are considered: n-gram overlap, Greedy String Tiling, and sentence alignment. Measured against a manually annotated corpus of source and derived news text, we show that a combined classifier with features automatically selected performs best overall for the ternary classification achieving an average F1-measure score of 0.664 across all three categories.

Lecture Notes in Computer Science | 2004

Learning to Harvest Information for the Semantic Web

Fabio Ciravegna; Sam Chapman; Alexiei Dingli; Yorick Wilks

In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.

Natural Language Engineering | 2002

Architectural elements of language engineering robustness

Diana Maynard; Valentin Tablan; Hamish Cunningham; Cristian Ursu; Horacio Saggion; Kalina Bontcheva; Yorick Wilks

We discuss robustness in LE systems from the perspective of engineering, and the predictability of both outputs and construction process that this entails. We present an architectural system that contributes to engineering robustness and low-overhead systems development (GATE, a General Architecture for Text Engineering). To verify our ideas we present results from the development of a multi-purpose cross-genre Named Entity recognition system. This system aims be robust across diverse input types, and to reduce the need for costly and timeconsuming adaptation of systems to new applications, with its capability to process texts from widely differing domains and genres.

Explore More