Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew Wilson is active.

Publication


Featured researches published by Andrew Wilson.


Archive | 2002

Grammatical word class variation within the British National Corpus sampler

Paul Rayson; Andrew Wilson; Geoffrey Leech

This paper examines the relationship between part-of-speech frequencies and text typology in the British National Corpus Sampler. Four pairwise comparisons of part-of-speech frequencies were made: written language vs. spoken language; informative writing vs. imaginative writing; conversational speech vs. ‘task-oriented’ speech; and imaginative writing vs. ‘task-oriented’ speech. The following variation gradient was hypothesized: conversation – task-oriented speech – imaginative writing – informative writing; however, the actual progression was: conversation – imaginative writing – task-oriented speech – informative writing. It thus seems that genre and medium interact in a more complex way than originally hypothesized. However, this conclusion has been made on the basis of broad, pre-existing text types within the BNC, and, in future, the internal structure of these text types may need to be addressed.


meeting of the association for computational linguistics | 2003

Extracting Multiword Expressions with A Semantic Tagger

Scott Piao; Paul Rayson; Dawn Archer; Andrew Wilson; Tony McEnery

Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) developed at Lancaster University to identify multiword units which depict single semantic concepts. The Meter Corpus (Gaizauskas et al., 2001; Clough et al., 2002) built in Sheffield was used to evaluate our approach. In our evaluation, this approach extracted a total of 4,195 MWE candidates, of which, after manual checking, 3,792 were accepted as valid MWEs, producing a precision of 90.39% and an estimated recall of 39.38%. Of the accepted MWEs, 68.22% or 2,587 are low frequency terms, occurring only once or twice in the corpus. These results show that our approach provides a practical solution to MWE extraction.


Archive | 1999

Standards for Tagsets.

Geoffrey Leech; Andrew Wilson

We normally add annotations to a text so that they can be re-used as a general research resource, by varied end-users other than the annotators themselves. This implies that the choice of linguistic categories for annotation should take account of the need for annotations which are as far as possible theoretically neutral, so that their re-use is not limited to those who have adopted a particular theoretical framework. Although this ideal of ‘theoretical neutrality’ is itself controversial and probably unattainable, it may be realistically seen as one of the goals of annotation. This is because we need to minimize the amount of automatic or manual adaptation that would have to be undertaken for the annotated corpus to be successfully used by research groups taking different theoretical positions. In the interests of interchangeability and re-usability of annotated corpora, it is important to avoid a ‘free-for-all’, or a ‘reinvention of the wheel’ every time a new project begins. A possible strategy to accomplish this is to strive for some kind of standardization.


ReCALL | 1997

Teaching and Language Corpora(TALC)

Tony McEnery; Andrew Wilson

In choosing a title for this paper, we have consciously copied the name of the series of biannual conferences, started at Lancaster in 1994, which aim to bring together those who have an interest in the application of corpora to the teaching of language and linguistics. Already, those conferences have set in train a series of publications – conference proceedings (Wilson and McEnery, 1994; Botley, Glass, McEnery and Wilson, 1996), a general selection of papers (Wichmann, Knowles, McEnery and Fligelstone, 1997) and a collection of papers related to multilingual copora (Botley, McEnery and Wilson, 1997). The aim of this paper is to summarize the progress to date in the field of teaching and language corpora, both as a general introduction and as a gateway to the more comprehensive literature which is developing. As such, this paper owes a considerable debt to all of the participants at the past two conferences.


Literary and Linguistic Computing | 2006

Development and Application of a Content Analysis Dictionary for Body Boundary Research

Andrew Wilson

Body image—especially self-perceptions of body boundaries—can have a significant impact on emotional well-being, personality, and behaviour. Fisher and Cleveland developed a scoring system for identifying two categories of body boundary imagery (Barrier and Penetration) in Rorschach test protocols, which Newbold has since extended to the analysis of narrative text. This paper describes the initial development of a content analysis dictionary (the Body Type Dictionary) for automating Barrier and Penetration scoring on English-language texts. To demonstrate its use and to provide a preliminary measure of validation, the dictionary is applied to a set of fictional fetish narratives and to samples from mainstream romantic fiction. The results demonstrate that the fetish narratives contain a significantly greater amount of Barrier imagery than the mainstream writing samples, which tallies with previous observations about body boundaries and appears to support the claim that writers with uncertain self-perceived boundaries will use more body boundary imagery in their writing. Suggestions for further validation studies and applications are given.


ReCALL | 1997

Teaching grammar again after twenty years: corpus-based help for teaching grammar

Tony McEnery; Andrew Wilson; Paul Barker

In this paper we consider how corpora may be of use in the teaching of grammar of the pre-tertiary level. Corpora are becoming well established in teaching in Universities. Corpora also have a role to play in secondary education, in that they can help decide how and what to teach, as well as changing the way in which puplis learn and providing the possibility of open-ended machine-aided tuition. Corpora also seem to provide what UK goverment sponsored reports on teaching grammar have called for – a data-driven approach to the subject.


Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties | 2006

Measuring MWE Compositionality Using Semantic Annotation

Scott Piao; Paul Rayson; Olga Mudraya; Andrew Wilson; Roger Garside

This paper reports on an experiment in which we explore a new approach to the automatic measurement of multi-word expression (MWE) compositionality. We propose an algorithm which ranks MWEs by their compositionality relative to a semantic field taxonomy based on the Lancaster English semantic lexicon (Piao et al., 2005a). The semantic information provided by the lexicon is used for measuring the semantic distance between a MWE and its constituent words. The algorithm is evaluated both on 89 manually ranked MWEs and on McCarthy et als (2003) manually ranked phrasal verbs. We compared the output of our tool with human judgments using Spearmans rank-order correlation coefficient. Our evaluation shows that the automatic ranking of the majority of our test data (86.52%) has strong to moderate correlation with the manual ranking while wide discrepancy is found for a small number of MWEs. Our algorithm also obtained a correlation of 0.3544 with manual ranking on McCarthy et als test data, which is comparable or better than most of the measures they tested. This experiment demonstrates that a semantic lexicon can assist in MWE compositionality measurement in addition to statistical algorithms.


Journal for the Study of the New Testament | 1992

The Pragmatics of Politeness and Pauline Epistolography: a Case Study of the Letter To Philemon

Andrew Wilson

Through a case study analysis of Pauls letter to Philemon using the framework of Geoffrey Leechs interpersonal rhetoric, this paper suggests that the techniques of modern linguistic pragmatics—in particular politeness theory-constitute a valuable approach to the Pauline writings by directing attention to the ways in which the relationship between the author and his addressees have affected the linguistic expression of ideas. The paper demonstrates that in Philemon considerations of politeness have significantly affected the way in which Paul makes his request. It is suggested that the analysis of other epistles by this method may pay important dividends for exegesis.


Archive | 2006

Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia.

Andrew Wilson; Olga Moudraia

In order to examine differences in attitudes to shoe fashions between women in Germany, Poland and Russia, we asked three samples of advanced female students of English to write a short English composition in response to the stimulus: “Tell us a little bit about the footwear (shoes, boots, etc.) you own and when you wear it”. We analysed the results using a manual qualitative content analysis and two forms of quantitative computer content analysis: one using project-specific categories developed from the qualitative content analysis and previous theory, the other using general semantic field categories. Both techniques were successful in highlighting similar between-group differences, suggesting that qualitative content analysis and project-specific categories can largely be dispensed with. Some issues in using non-native student English compositions as data in cross-cultural studies are also considered.


Literary and Linguistic Computing | 2011

The regressive imagery dictionary : a test of its concurrent validity in English, German, Latin, and Portuguese.

Andrew Wilson

Since the 1970s, the Regressive Imagery Dictionary (RID) has been widely used as a content analysis tool for both psychological and literary research on texts. Today, besides the original English version, it exists in translations for seven other languages. However, the wide-ranging validation studies conducted on the English version have mostly not been replicated for the various translations, hence the validity of these translations must rest for the time being on their concurrent validity with the English original. This article examines the concurrent validity of the German, Latin, and Portuguese translations of the RID. Taking the English RID as a de facto standard, it uses translations of the psalms (N = 150) to check how far the three translations of the RID correspond to the English original in identifying whether there is a significant dominance of primary or secondary process lexis in a text. Overall, compared against the English version, the Latin translation has 77.33% accuracy, the German translation 68%, and the Portuguese translation 56.67%. In terms of the sensitivity and specificity of classification, the Latin translation performs quite well on both measures; in contrast, the German translation is conservative, whilst the Portuguese translation is liberal.

Collaboration


Dive into the Andrew Wilson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dawn Archer

University of Central Lancashire

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge