Paul Rayson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Rayson is active.

Explore More

Publication

Featured researches published by Paul Rayson.

The Workshop on Comparing Corpora | 2000

Comparing Corpora using Frequency Profiling

Paul Rayson; Roger Garside

This paper describes a method of comparing corpora which uses frequency profiling. The method can be used to discover key words in the corpora which differentiate one corpus from another. Using annotated corpora, it can be applied to discover key grammatical or word-sense categories. This can be used as a quick way in to find the differences between the corpora and is shown to have applications in the study of social differentiation in the use of English vocabulary, profiling of learner English and document analysis in the software engineering process.

aspect-oriented software development | 2007

Semantics-based composition for aspect-oriented requirements engineering

Ruzanna Chitchyan; Awais Rashid; Paul Rayson; Robert Waters

In this paper, we discuss the limitations of the current syntactic composition mechanisms in aspect-oriented requirements engineering (AORE). We highlight that such composition mechanisms not only increase coupling between aspects and base concerns but are also insufficient to capture the intentionality of the aspect composition. Furthermore, they force the requirements engineer to reason about semantic influences and trade-offs among aspects from a syntactic perspective. We present a requirements description language (RDL) that enriches the existing natural language requirements specification with semantic information derived from the semantics of the natural language itself. Composition specifications are written based on these semantics rather than requirements syntax hence providing improved means for expressing the intentionality of the composition, in turn facilitating semantics-based reasoning about aspect influences and trade-offs. We also discuss the practicality of the use of this RDL by outlining the automation support for requirements annotation (realized as an extension of the Wmatrix natural language processing tool suite) to expose the semantics which are in turn utilized to facilitate composition and analysis (supported by the MRAT tool).

automated software engineering | 2005

EA-Miner: a tool for automating aspect-oriented requirements identification

Américo Sampaio; Ruzanna Chitchyan; Awais Rashid; Paul Rayson

Aspect-Oriented requirements engineering helps to achieve early separation of concerns by supporting systematic analysis of broadly-scoped properties such as security, real-time constraints, etc. The early identification and separation of aspects and base abstractions crosscut by them helps to avoid costly refactorings at later stages such as design and code. However, if not handled effectively, the aspect identification task can become a bottleneck requiring a significant effort due to the large amount of, often poorly structured or imprecise, information available to a requirements engineer. In this paper, we describe a tool, EA-Miner, that provides effective automated support for identifying and separating aspectual and non-aspectual concerns as well as their crosscutting relationships at the requirements level. The tool utilises natural language processing techniques to reason about the properties of the concerns and model their structure and relationships.

IEEE Transactions on Software Engineering | 2005

Shallow knowledge as an aid to deep understanding in early phase requirements engineering

Peter Sawyer; Paul Rayson; Ken Cosh

Requirements engineerings continuing dependence on natural language description has made it the focus of several efforts to apply language engineering techniques. The raw textual material that forms an input to early phase requirements engineering and which informs the subsequent formulation of the requirements is inevitably uncontrolled and this makes its processing very hard. Nevertheless, sufficiently robust techniques do exist that can be used to aid the requirements engineer provided that the scope of what can be achieved is understood. In this paper, we show how combinations of lexical and shallow semantic analysis techniques developed from corpus linguistics can help human analysts acquire the deep understanding needed as the first step towards the synthesis of requirements.

Knowledge Based Systems | 2008

A flexible framework to experiment with ontology learning techniques

Ricardo Gacitua; Peter Sawyer; Paul Rayson

Ontology learning refers to extracting conceptual knowledge from several sources and building an ontology from scratch, enriching, or adapting an existing ontology. It uses methods from a diverse spectrum of fields such as natural language processing, artificial intelligence and machine learning. However, a crucial challenging issue is to quantitatively evaluate the usefulness and accuracy of both techniques and combinations of techniques, when applied to ontology learning. It is an interesting problem because there are no published comparative studies. We are developing a flexible framework for ontology learning from text which provides a cyclical process that involves the successive application of various NLP techniques and learning algorithms for concept extraction and ontology modelling. The framework provides support to evaluate the usefulness and accuracy of different techniques and possible combinations of techniques into specific processes, to deal with the above challenge. We show our frameworks efficacy as a workbench for testing and evaluating concept identification. Our initial experiment supports our assumption about the usefulness of our approach.

Information Systems Frontiers | 2002

REVERE: Support for Requirements Synthesis from Documents

Peter Sawyer; Paul Rayson; Roger Garside

Documents are important sources of system requirements. This is particularly true of domains that are document-centric in terms of their operational and development processes. For system evolution in organisations that have been subject to organisational change and loss of organisational memory, documents may be the major source of key requirements. Hence, systems engineers often face a daunting task of synthesising crucial requirements from a range of documents that include standards, interview transcripts and legacy specifications. The goal of REVERE was to investigate support for this task which has been described as document archaeology (Robertson S. and Robertson J. Mastering the Requirements Process. Reading, MA, Addison-Wesley, 1999). This paper describes the resulting REVERE toolset, its utility for document archaeology and for other tasks that have emerged in the course of our experiments with the toolset.

Computer Speech & Language | 2005

Comparing and combining a semantic tagger and a statistical tool for MWE extraction

Scott Piao; Paul Rayson; Dawn Archer; Tony McEnery

Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and statistically driven algorithms have been proposed, efficient MWE extraction still remains an unsolved issue. In this paper, we evaluate the Lancaster UCREL Semantic Analysis System (henceforth USAS (Rayson, P., Archer, D., Piao, S., McEnery, T., 2004. The UCREL semantic analysis system. In: Proceedings of the LREC-04 Workshop, Beyond Named Entity Recognition Semantic labelling for NLP tasks, Lisbon, Portugal. pp. 7-12)) for MWE extraction, and explore the possibility of improving USAS by incorporating a statistical algorithm. Developed at Lancaster University, the USAS system automatically annotates English corpora with semantic category information. Employing a large-scale semantically classified multi-word expression template database, the system is also capable of detecting many multiword expressions, as well as assigning semantic field information to the MWEs extracted. Whilst USAS therefore offers a unique tool for MWE extraction, allowing us to both extract and semantically classify MWEs, it can sometimes suffer from low recall. Consequently, we have been comparing USAS, which employs a symbolic approach, to a statistical tool, which is based on collocational information, in order to determine the pros and cons of these different tools, and more importantly, to examine the possibility of improving MWE extraction by combining them. As we report in this paper, we have found a highly complementary relation between the different tools: USAS missed many domain-specific MWEs (law/court terms in this case), and the statistical tool missed many commonly used MWEs that occur in low frequencies (lower than three in this case). Due to their complementary relation, we are proposing that MWE coverage can be significantly increased by combining a lexicon-based symbolic approach and a collocation-based statistical approach.

Archive | 2002

Grammatical word class variation within the British National Corpus sampler

Paul Rayson; Andrew Wilson; Geoffrey Leech

This paper examines the relationship between part-of-speech frequencies and text typology in the British National Corpus Sampler. Four pairwise comparisons of part-of-speech frequencies were made: written language vs. spoken language; informative writing vs. imaginative writing; conversational speech vs. ‘task-oriented’ speech; and imaginative writing vs. ‘task-oriented’ speech. The following variation gradient was hypothesized: conversation – task-oriented speech – imaginative writing – informative writing; however, the actual progression was: conversation – imaginative writing – task-oriented speech – informative writing. It thus seems that genre and medium interact in a more complex way than originally hypothesized. However, this conclusion has been made on the basis of broad, pre-existing text types within the BNC, and, in future, the internal structure of these text types may need to be addressed.

meeting of the association for computational linguistics | 2003

Extracting Multiword Expressions with A Semantic Tagger

Scott Piao; Paul Rayson; Dawn Archer; Andrew Wilson; Tony McEnery

Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) developed at Lancaster University to identify multiword units which depict single semantic concepts. The Meter Corpus (Gaizauskas et al., 2001; Clough et al., 2002) built in Sheffield was used to evaluate our approach. In our evaluation, this approach extracted a total of 4,195 MWE candidates, of which, after manual checking, 3,792 were accepted as valid MWEs, producing a precision of 90.39% and an estimated recall of 39.38%. Of the accepted MWEs, 68.22% or 2,587 are low frequency terms, occurring only once or twice in the corpus. These results show that our approach provides a practical solution to MWE extraction.

international workshop on computational forensics | 2008

Supporting Law Enforcement in Digital Communities through Natural Language Analysis

Danny Hughes; Paul Rayson; James Walkerdine; Kevin Lee; Phil Greenwood; Awais Rashid; Corinne May-Chahal; Margaret Brennan

Recent years have seen an explosion in the number and scale of digital communities (e.g. peer-to-peer file sharing systems, chat applications and social networking sites). Unfortunately, digital communities are host to significant criminal activity including copyright infringement, identity theft and child sexual abuse. Combating this growing level of crime is problematic due to the ever increasing scale of todays digital communities. This paper presents an approach to provide automated support for the detection of child sexual abuse related activities in digital communities. Specifically, we analyze the characteristics of child sexual abuse media distribution in P2P file sharing networks and carry out an exploratory study to show that corpus-based natural language analysis may be used to automate the detection of this activity. We then give an overview of how this approach can be extended to police chat and social networking communities.

Explore More