Jack Grieve
Aston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jack Grieve.
Literary and Linguistic Computing | 2007
Jack Grieve
The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible authors writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements.
Archive | 2010
Jack Grieve; Douglas Biber; Eric Friginal; Tatiana Nekrasova
A blog, short for a weblog, is a website containing an archive of regularly updated online postings. The postings are generally made by one person and presented in reverse chronological order. The archive is generally made freely available to the public. The postings tend to consist primarily of raw text, but may also contain hyperlinks and other media, including picture, video and sound files. Often blogs allow for readers to post comments as well.
Language Variation and Change | 2011
Jack Grieve; Dirk Speelman; Dirk Geeraerts
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
Computers, Environment and Urban Systems | 2016
Yuan Huang; Diansheng Guo; Alice Bee Kasakoff; Jack Grieve
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Corpus Linguistics and Linguistic Theory | 2012
Jack Grieve
Abstract This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Morans I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.
Archive | 2009
Douglas Biber; Jack Grieve; Gina Iberri-Shea
Introduction Written registers in English have undergone extensive stylistic change over the past four centuries, in response to changes in the purposes of communication, the demographics of the reading public and attitudinal preferences of authors. For example, Biber and Finegan (1989, 1997) document the way in which written prose registers in the seventeenth century were already quite different from conversational registers, and how those registers evolved to become even more distinct from speech over the course of the eighteenth century. Informational expository registers like medical prose and science prose have continued to develop more ‘literate’ styles over the last two centuries, including increasing use of passive verbs, relative clause constructions and elaborated noun phrases generally (see Atkinson 1992, 2001, Biber 1995: 280–313, Biber and Finegan 1997). These linguistic developments correspond to the development of a more specialized readership, more specialized purposes, and a fuller exploitation of the production possibilities of the written mode. That is, in marked contrast to the general societal trends towards a wider lay readership and the corresponding need for popular written registers, readers of medical research prose and science prose have become increasingly more specialized in their backgrounds and training, and correspondingly these registers have become more specialized in linguistic form. Surprisingly, even some more ‘popular’ registers, such as newspaper reportage, have followed a similar historical path (see Biber 2003). One linguistic domain that reflects these historical developments is the choice among structural devices used to modify noun phrases.
American Speech | 2013
Jack Grieve; Costanza Asnaghi; Tom Ruette
This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.
The Mind Research Repository | 2016
Martijn Wieling; Jack Grieve; Gosse Bouma; Josef Fruehwald; John Coleman; Mark Liberman
In this study, we investigate cross-linguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers and holds when functional differences between UM and UH are controlled. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.
English Language and Linguistics | 2017
Jack Grieve; Andrea Nini; Diansheng Guo
This paper introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method based on a multi-billion word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of speech, and word formations processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.
Proceedings of the First Workshop on Abusive Language Online | 2017
Isobelle Clarke; Jack Grieve
In this paper, we use a new categorical form of multidimensional register analysis to identify the main dimensions of functional linguistic variation in a corpus of abusive language, consisting of racist and sexist Tweets. By analysing the use of a wide variety of parts-of-speech and grammatical constructions, as well as various features related to Twitter and computer-mediated communication, we discover three dimensions of linguistic variation in this corpus, which we interpret as being related to the degree of interactive, antagonistic and attitudinal language exhibited by individual Tweets. We then demonstrate that there is a significant functional difference between racist and sexist Tweets, with sexists Tweets tending to be more interactive and attitudinal than racist Tweets.