Tony McEnery | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tony McEnery is active.

Explore More

Publication

Featured researches published by Tony McEnery.

Discourse & Society | 2008

A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press

Paul Baker; Costas Gabrielatos; Majid KhosraviNik; Michal Krzyzanowski; Tony McEnery; Ruth Wodak

This article discusses the extent to which methods normally associated with corpus linguistics can be effectively used by critical discourse analysts. Our research is based on the analysis of a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants and migrants (collectively RASIM). We discuss how processes such as collocation and concordance analysis were able to identify common categories of representation of RASIM as well as directing analysts to representative texts in order to carry out qualitative analysis. The article suggests a framework for adopting corpus approaches in critical discourse analysis.

Computer Speech & Language | 2005

Comparing and combining a semantic tagger and a statistical tool for MWE extraction

Scott Piao; Paul Rayson; Dawn Archer; Tony McEnery

Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and statistically driven algorithms have been proposed, efficient MWE extraction still remains an unsolved issue. In this paper, we evaluate the Lancaster UCREL Semantic Analysis System (henceforth USAS (Rayson, P., Archer, D., Piao, S., McEnery, T., 2004. The UCREL semantic analysis system. In: Proceedings of the LREC-04 Workshop, Beyond Named Entity Recognition Semantic labelling for NLP tasks, Lisbon, Portugal. pp. 7-12)) for MWE extraction, and explore the possibility of improving USAS by incorporating a statistical algorithm. Developed at Lancaster University, the USAS system automatically annotates English corpora with semantic category information. Employing a large-scale semantically classified multi-word expression template database, the system is also capable of detecting many multiword expressions, as well as assigning semantic field information to the MWEs extracted. Whilst USAS therefore offers a unique tool for MWE extraction, allowing us to both extract and semantically classify MWEs, it can sometimes suffer from low recall. Consequently, we have been comparing USAS, which employs a symbolic approach, to a statistical tool, which is based on collocational information, in order to determine the pros and cons of these different tools, and more importantly, to examine the possibility of improving MWE extraction by combining them. As we report in this paper, we have found a highly complementary relation between the different tools: USAS missed many domain-specific MWEs (law/court terms in this case), and the statistical tool missed many commonly used MWEs that occur in low frequencies (lower than three in this case). Due to their complementary relation, we are proposing that MWE coverage can be significantly increased by combining a lexicon-based symbolic approach and a collocation-based statistical approach.

meeting of the association for computational linguistics | 2003

Extracting Multiword Expressions with A Semantic Tagger

Scott Piao; Paul Rayson; Dawn Archer; Andrew Wilson; Tony McEnery

Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) developed at Lancaster University to identify multiword units which depict single semantic concepts. The Meter Corpus (Gaizauskas et al., 2001; Clough et al., 2002) built in Sheffield was used to evaluate our approach. In our evaluation, this approach extracted a total of 4,195 MWE candidates, of which, after manual checking, 3,792 were accepted as valid MWEs, producing a precision of 90.39% and an estimated recall of 39.38%. Of the accepted MWEs, 68.22% or 2,587 are low frequency terms, occurring only once or twice in the corpus. These results show that our approach provides a practical solution to MWE extraction.

ReCALL | 1997

Teaching and Language Corpora(TALC)

Tony McEnery; Andrew Wilson

In choosing a title for this paper, we have consciously copied the name of the series of biannual conferences, started at Lancaster in 1994, which aim to bring together those who have an interest in the application of corpora to the teaching of language and linguistics. Already, those conferences have set in train a series of publications – conference proceedings (Wilson and McEnery, 1994; Botley, Glass, McEnery and Wilson, 1996), a general selection of papers (Wichmann, Knowles, McEnery and Fligelstone, 1997) and a collection of papers related to multilingual copora (Botley, McEnery and Wilson, 1997). The aim of this paper is to summarize the progress to date in the field of teaching and language corpora, both as a general introduction and as a gateway to the more comprehensive literature which is developing. As such, this paper owes a considerable debt to all of the participants at the past two conferences.

Computer Assisted Language Learning | 1995

A Statistical Analysis of Corpus Based Computer vs. Traditional Human Teaching Methods of Part of Speech Analysis

Tony McEnery

Abstract Two approaches to teaching grammar were compared with respect to accuracy of participant response over time. Of seventeen first year English Language undergraduates who participated in the seven week experiment, nine were taught grammar via the traditional classroom‐based “human teacher” method, while the remainder used CyberTutor, a corpus based computer aided linguistic learning program. This program allowed subjects to annotate sentences whilst providing instant feedback and help facilities. The computer aided group out‐performed their human‐taught counterparts in terms of accuracy and number of words analysed. At the end of the experiment, mean accuracy was 89.34% for the computer aided group, whereas it was only 13.64% for the human‐taught group. The overall finding was that in terms of teaching the parts‐of‐speech at least, corpus‐based CALL programs may be more effective than traditional classroom interaction.

Literary and Linguistic Computing | 2004

Corpus linguistics and South Asian languages : corpus creation and tool development.

Paul Baker; Andrew Hardie; Tony McEnery; Richard Xiao; Kalina Bontcheva; Hamish Cunningham; Robert J. Gaizauskas; Oana Hamza; Diana Maynard; Valentin Tablan; Cristian Ursu; B. D. Jayaram; Mark Leisher

This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineering), which was undertaken by the Universities of Lancaster and Sheffield. The primary resource developed by the project is the EMILLE Corpus, which consists of a series of monolingual corpora for fourteen South Asian languages, totalling more than 96 million words, and a parallel corpus of English and five of these languages. The EMILLE Corpus also includes an annotated component, namely, part-of-speech tagged Urdu data, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use in Hindi. In addition, the project has had to address a number of issues related to establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools for EMILLE has contributed to the ongoing development of the LE architecture GATE, which has been extended to make use of Unicode. GATE thus plugs some of the gaps for language processing R&D necessary for the exploitation of the EMILLE corpora.

English Studies | 2005

HELP or HELP to: What Do Corpora Have to Say?

Tony McEnery; Richard Xiao

In this paper, we will examine a range of factors that may potentially influence a language users choice of a full or bare infinitive following HELP. The factors include language variety, language change, spoken/written distinction, semantic distinction, and syntactic conditions, namely, an intervening noun phrase or adverbial, the number of intervening words, to preceding HELP, the passive construction, inflections of HELP, and it as the subject. Six corpora are used in this paper, four written corpora (LOB, Brown, FLOB and Frown) and two spoken corpora (the speech section of the BNC and the Corpus of Professional Spoken American English, CPSA).

ReCALL | 1997

Teaching grammar again after twenty years: corpus-based help for teaching grammar

Tony McEnery; Andrew Wilson; Paul Barker

In this paper we consider how corpora may be of use in the teaching of grammar of the pre-tertiary level. Corpora are becoming well established in teaching in Universities. Corpora also have a role to play in secondary education, in that they can help decide how and what to teach, as well as changing the way in which puplis learn and providing the possibility of open-ended machine-aided tuition. Corpora also seem to provide what UK goverment sponsored reports on teaching grammar have called for – a data-driven approach to the subject.

Archive | 2015

Who benefits when discourse gets democratised?:analysing a Twitter corpus around the British Benefits Street debate

Paul Baker; Tony McEnery

In this chapter we examine discourses on the social media site Twitter around people who receive government support (commonly referred to as benefits), in the UK. Between 2008–2009 and 2011–2012, the UK experienced recession, and after coming to power in 2010 the Conservative-led coalition government embarked on a program of fiscal austerity that included cuts to some benefits. Baker (forthcoming) analysed the discourse around benefits in Britain’s most widely-read newspaper The Sun (a conservative tabloid), comparing the years 2002 and 2012. In 2012, the discourse around benefits was less sympathetic towards many types of benefit recipients, with the newspaper notably focusing on two constructions: benefits cheats and benefits culture, which respectively resulted in negative stories at the level of both the individual and the wider society. The newspaper painted a compelling picture of a benefits system created by the previous government that was both too soft and open to abuse and thus in need of reform.

Corpora | 2008

Construction and annotation of a corpus of contemporary Nepali

Yogendra P. Yadava; Andrew Hardie; Ram Raj Lohani; Bhim Narayan Regmi; Srishtee Gurung; Amar Gurung; Tony McEnery; Jens Allwood; Pat Hall

In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The NNC is encoded as Unicode text and marked up in CES-compatible XML. The whole corpus is also annotated with part-of-speech tags. We describe the process of devising a tagset and retraining tagger software for the Nepali language, for which there were no existing corpus resources. Finally, we explore some present and future applications of the corpus, including lexicography, NLP, and grammatical research.

Explore More