Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew Hardie is active.

Publication


Featured researches published by Andrew Hardie.


BMJ | 2017

The online use of Violence and Journey metaphors by patients with cancer, as compared with health professionals: a mixed methods study

Elena Semino; Zsofia Demjen; Jane Demmen; Veronika Koller; Sheila Payne; Andrew Hardie; Paul Rayson

Objective To compare the frequencies with which patients with cancer and health professionals use Violence and Journey metaphors when writing online; and to investigate the use of these metaphors by patients with cancer, in view of critiques of war-related metaphors for cancer and the adoption of the notion of the ‘cancer journey’ in UK policy documents. Design Computer-assisted quantitative and qualitative study of two data sets totalling 753 302 words. Setting A UK-based online forum for patients with cancer (500 134 words) and a UK-based website for health professionals (253 168 words). Participants 56 patients with cancer writing online between 2007 and 2012; and 307 health professionals writing online between 2008 and 2013. Results Patients with cancer use both Violence metaphors and Journey metaphors approximately 1.5 times per 1000 words to describe their illness experience. In similar online writing, health professionals use each type of metaphor significantly less frequently. Patients’ Violence metaphors can express and reinforce negative feelings, but they can also be used in empowering ways. Journey metaphors can express and reinforce positive feelings, but can also be used in disempowering ways. Conclusions Violence metaphors are not by default negative and Journey metaphors are not by default a positive means of conceptualising cancer. A blanket rejection of Violence metaphors and an uncritical promotion of Journey metaphors would deprive patients of the positive functions of the former and ignore the potential pitfalls of the latter. Instead, greater awareness of the function (empowering or disempowering) of patients’ metaphor use can lead to more effective communication about the experience of cancer.


Literary and Linguistic Computing | 2004

Corpus linguistics and South Asian languages : corpus creation and tool development.

Paul Baker; Andrew Hardie; Tony McEnery; Richard Xiao; Kalina Bontcheva; Hamish Cunningham; Robert J. Gaizauskas; Oana Hamza; Diana Maynard; Valentin Tablan; Cristian Ursu; B. D. Jayaram; Mark Leisher

This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineering), which was undertaken by the Universities of Lancaster and Sheffield. The primary resource developed by the project is the EMILLE Corpus, which consists of a series of monolingual corpora for fourteen South Asian languages, totalling more than 96 million words, and a parallel corpus of English and five of these languages. The EMILLE Corpus also includes an annotated component, namely, part-of-speech tagged Urdu data, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use in Hindi. In addition, the project has had to address a number of issues related to establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into Unicode and producing a number of basic LE tools. The development of tools for EMILLE has contributed to the ongoing development of the LE architecture GATE, which has been extended to make use of Unicode. GATE thus plugs some of the gaps for language processing R&D necessary for the exploitation of the EMILLE corpora.


Transactions in Gis | 2015

Automatically Analyzing Large Texts in a GIS Environment: The Registrar General's Reports and Cholera in the 19th Century

Patricia Murrieta-Flores; Alistair Baron; Ian N. Gregory; Andrew Hardie; Paul Rayson

The aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues of research to analyze large textual collections in the Humanities, particularly in historical research. Using as examples parts of the collection of the Registrar Generals Reports that contain more than 200,000 pages of descriptions, census data and vital statistics for the UK, we introduce newly developed automated textual tools and well known spatial analyses used in combination to investigate a case study of the references made to cholera and other diseases in these historical sources, and their relationship to place-names during Victorian times. The integration of such techniques has allowed us to explore, in an automatic way, this historical source containing millions of words, to examine the geographies depicted in it, and to identify textual and geographic patterns in the corpus.


Corpora | 2008

Construction and annotation of a corpus of contemporary Nepali

Yogendra P. Yadava; Andrew Hardie; Ram Raj Lohani; Bhim Narayan Regmi; Srishtee Gurung; Amar Gurung; Tony McEnery; Jens Allwood; Pat Hall

In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The NNC is encoded as Unicode text and marked up in CES-compatible XML. The whole corpus is also annotated with part-of-speech tags. We describe the process of devising a tagset and retraining tagger software for the Nepali language, for which there were no existing corpus resources. Finally, we explore some present and future applications of the corpus, including lexicography, NLP, and grammatical research.


Corpus Linguistics and Linguistic Theory | 2008

A Collocation-based approach to Nepali postpositions

Andrew Hardie

Abstract Using the Nepali National Corpus, a collocation-based technique is applied to the categorization of Nepali postpositions. Ergative le, accusative lāī, and genitive ko/kā/kī are frequently considered in the literature to be part of the Nepali nominal inflection paradigm, but opinion differs on other postpositions. The most significant collocations of several postpositions are examined for patterns that characterize postpositions as a category or categories. Two overarching patterns — collocation with semantically coherent nouns, and collocation with words for which the postposition functions as a subcategorizer — are identified. The analysis of some postpositions as part of the nominal paradigm is not supported by the analysis of these patterns. However, postpositions traditionally seen as part of the nominal paradigm do collocate more with non-lexical words, especially pronouns.


ICAME Journal | 2014

Modest XML for Corpora: Not a standard, but a suggestion

Andrew Hardie

Abstract This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need to engage in corpus construction. While extensive standards for corpus encoding exist - most notably, the Text Encoding Initiative’s Guidelines and the Corpus Encoding Standard based on them - these are rather heavyweight approaches, implicitly intended for major corpus-building projects, which are rather different from the increasingly common efforts in corpus construction undertaken by individual researchers in support of their personal research goals. Therefore, there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding. This paper presents such a set of suggestions, dubbed Modest XML for Corpora, and posits that such a set of pointers to a limited level of XML knowledge could work as part of the normal, general training of corpus linguists. The Modest XML recommendations cover the following set of things, which, according to the foregoing argument, are sufficient knowledge about XML for most corpus linguists’ day-to-day needs: use of tags; adding attribute value pairs; recommended use of attributes; nesting of tags; encoding of special characters; XML well-formedness; a collection of de facto standard tags and attributes; going beyond the basic de facto standard tags; and text headers.


language resources and evaluation | 2007

From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia

Andrew Hardie

Much electronic text in the languages of South Asia has been published on the Internet. However, while Unicode has emerged as the favoured encoding system of corpus and computational linguists, most South Asian language data on the web uses one of a wide range of non-standard legacy encodings. This paper describes the difficulties inherent in converting text in these encodings to Unicode. Among the various legacy encodings for South Asian scripts, the most problematic are 8-bit fonts based on graphical principles (as opposed to the logical principles of Unicode). Graphical fonts typically encode several features in ways highly incompatible with Unicode. For instance, half-form glyphs used to construct conjunct consonants are typically separate code points in 8-bit fonts; in Unicode they are represented by the full consonant followed by virama. There are many more such cases. The solution described here is an approach to text conversion based on mapping rules. A small number of generalised rules (plus the capacity for more specialised rules) captures the behaviour of each character in a font, building up a conversion algorithm for that encoding. This system is embedded in a font-mapping program, outputting CES-compliant SGML Unicode. This program, a generalised text-conversion tool, has been employed extensively in corpus-building for South Asian languages.


Serials: The Journal for The Serials Community | 2009

Freeing up digital content with text mining: new research means new licences

Alastair Dunning; Ian N. Gregory; Andrew Hardie

The method by which users have traditionally exploited digital resources such as Early English Books Online (EEBO) has been via keyword search. However, researchers are increasingly finding new ways to exploit entire corpora of digitized resources, treating the resource as a single entity to be analysed, rather than searching or sifting through the resource for individual parts. This article looks at the work of one research team at the University of Lancaster, exploring how they are using a corpus of seventeenth-century newsbooks to leverage open new areas of research. Using tools borrowed from linguistics and geography, the researchers can analyse the place names mentioned in the newsbooks and see which linguistic concepts (e.g. war, money) were associated with which geographical areas. Such work has implications not only for future research but also for the resource managers to negotiate and manage the licences related to such resources.


(2017) | 2017

Metaphor, Cancer and the End of Life:A Corpus-based Study

Elena Semino; Zsofia Demjen; Andrew Hardie; Sheila Payne; Paul Rayson

© 2018 Taylor & Francis. All rights reserved. This book presents the methodology, findings and implications of a large-scale corpus-based study of the metaphors used to talk about cancer and the end of life (including care at the end of life) in the UK. It focuses on metaphor as a central linguistic and cognitive tool that is frequently used to talk and think about sensitive and subjective experiences, such as illness, emotions, death, and dying, and that can both help and hinder communication and well-being, depending on how it is used. The book centers on a combination of qualitative analyses and innovative corpus linguistic methods. This methodological assemblage was applied to the systematic study of the metaphors used in a 1.5-million-word corpus. The corpus consists of interviews with, and online forum posts written by, members of three stakeholder groups, namely: patients diagnosed with advanced cancer; unpaid carers looking after a relative with a diagnosis of advanced cancer; and healthcare professionals. The book presents a range of qualitative and quantitative findings that have implications for: metaphor theory and analysis; corpus linguistic and computational approaches to metaphor; and training and practice in cancer care and hospice, palliative and end-of-life care.


Journal of Siberian Federal University | 2016

From Digital Resources to Historical Scholarship with the British Library 19th Century Newspaper Collection

Ian N. Gregory; Paul Atkinson; Andrew Hardie; Amelia Joulain-Jay; Daniel Kershaw; Catherine Porter; Paul Rayson; Christopher John Rupp

It is increasingly acknowledged that the Digital Humanities have placed too much emphasis on data creation and that the major priority should be turning digital sources into contributions to knowledge. While this sounds relatively simple, doing it involves intermediate stages of research that enhance digital sources, develop new methodologies and explore their potential to generate new knowledge from the source. While these stages are familiar in the social sciences they are less so in the humanities. In this paper we explore these stages based on research on the British Library’s Nineteenth Century Newspaper Collection, a corpus of many billion words that has much to offer to our understanding of the nineteenth century but whose size and complexity makes it difficult to work with.

Collaboration


Dive into the Andrew Hardie's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge