Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Th. Gries is active.

Publication


Featured researches published by Stefan Th. Gries.


Cognitive Linguistics | 2005

Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions

Doris Schönefeld; Stefan Th. Gries

Abstract Much recent work in Cognitive Linguistics and neighbouring disciplines has adopted a so-called usage-based perspective in which generalizations are based on the analysis of authentic usage data provided by computerized corpora. However, the analysis of such data does not always utilize methodological findings from other disciplines to avoid analytical pitfalls and, at the same time, generate robust results. A case in point is the strategy of using corpus frequencies. In this paper, we take up a recently much debated issue from construction grammar concerning the association between verbs and argument-structure constructions, and investigate a construction, the English as-predicative, in order to test the predictive power of different kinds of frequency data against that of a recent, more refined corpus-based approach, the so-called collexeme analysis. To that end, the results of the application of these corpus-based approaches to an analysis of the as predicative are compared with the results of a sentence-completion experiment. Concerning the topic under consideration, collexeme analysis is not only shown to be superior on a variety of theoretical and methodological grounds, it also significantly outperforms frequency as a predictor of subjects’ production preferences. We conclude by pointing out some implications for usage-based approaches.


Corpus Linguistics and Linguistic Theory | 2009

Corpora and experimental methods: A state-of-the-art review

Gaëtanelle Gilquin; Stefan Th. Gries

Abstract This paper offers a state-of-the-art review of the combination of corpora and experimental methods. Using a sample of recent studies, it shows (i) that psycholinguists regularly exploit the benefits of combining corpus and experimental data, whereas corpus linguists do so much more rarely, and (ii) that psycholinguists and corpus linguists use corpora in different ways in terms of the dichotomy of exploratory/descriptive vs. hypothesis-testing as well as the corpus-linguistic methods that are used. Possible reasons for this are suggested and arguments are presented for why (and how) corpus linguists should look more into the possibilities of complementing their corpus studies with experimental data.


Corpus Linguistics and Linguistic Theory | 2006

Ways of trying in Russian: clustering behavioral profiles

Dagmar Divjak; Stefan Th. Gries

Abstract This article proposes a methodology for addressing three long-standing problems of near synonym research. First, we show how the internal structure of a group of near synonyms can be revealed. Second, we deal with the problem of distinguishing the subclusters and the words in those subclusters from each other. Finally, we illustrate how these results identify the semantic properties that should be mentioned in lexicographic entries. We illustrate our methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY. Our approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, we analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. We code each particular instance in terms of 87 variables (a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic characteristics that form a verbs behavioral profile. The resulting co-occurrence table is evaluated by means of a hierarchical agglomerative cluster analysis and additional quantitative methods. The results show that this behavioral profile approach can be used (i) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and (ii) to make explicit the scales of variation along which the near synonymous verbs vary.


Literary and Linguistic Computing | 2009

Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition

Martin Hilpert; Stefan Th. Gries

The use of corpora that are divided into temporally ordered stages is becoming increasingly wide-spread in historical corpus linguistics. This development is partly due to the fact that more and more resources of this kind are being developed. Since the assessment of frequency changes over multiple periods of time is a relatively recent practice, there are few agreed-upon standards of how such trends should be statistically interpreted. This article addresses the need for a basic analytical toolbox that is specifically tailored to the interpretation of frequency changes in multistage diachronic corpora. We present a number of suggestions for the analysis of data that analysts commonly face in historical studies, but also in the study of language acquisition.


Corpora | 2008

The identification of stages in diachronic data: variability-based neighbour clustering

Stefan Th. Gries; Martin Hilpert

In this paper, we introduce a data-driven bottom-up clustering method for the identification of stages in diachronic corpus data that differ from each other quantitatively. Much like regular approaches to hierarchical clustering, it is based on identifying and merging the most cohesive groups of data points, but, unlike regular approaches to clustering, it allows for the merging of temporally adjacent data, thus, in effect, preserving the chronological order. We exemplify the method with two case studies, one on verbal complementation of shall, the other on the development of the perfect in English.


Archive | 2006

Words and their metaphors: A corpus-based approach

Anatol Stefanowitsch; Stefan Th. Gries

In this paper, I propose and demonstrate a corpus-based approach to the investigation of metaphorical target domains based on retrieving representative lexical items from the target domain and identifying the metaphorical expressions associated with them. I show that this approach is superior in terms of data coverage compared to the traditional method of eclectically collecting citations or gathering data from introspection. In addition to its superior coverage, a corpus-based approach allows us to quantify the frequency of individual metaphors, and I show how central metaphors can be identified on the basis of such quantitative data. Finally, I argue that a focus on metaphors associated with individual lexical items opens up the possibility of investigating the interaction between metaphor and lexical semantics.


Corpus Linguistics and Linguistic Theory | 2005

Null-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff

Stefan Th. Gries

Abstract 1. Introduction In this issue of Corpus Linguistics and Linguistic Theory , Adam Kilgarriff discusses several issues concerned with the role of probabilistic modelling and statistical hypothesis testing in the domain of corpus linguistics and computational linguistics. Given the overall importance of these issues to the above-mentioned fields, I felt that the topic merits even more discussion and decided to add my own two cents with the hope that this discussion note triggers further commentaries or even some lively discussion and criticism. The points raised in Kilgarriff’s paper are various and important and considerations of space do not allow me to address all of them in as great detail as they certainly deserve. I will therefore concentrate on only one particular aspect of the paper which I find ‒ given my own research history and subjective interests ‒ particularly important, namely the issue of statistical hypothesis testing. More precisely, I will address one of the central claims of Kilgarriff’s paper. Kilgarriff argues ‒ apparently taking up issues from methodological discussion in many other disciplines (cf. section 2) ‒ that the efficiency of statistical null-hypothesis testing is often doubtful because (i) “[g]iven enough data, H0 is almost always rejected however arbitrary the data” and (ii) “true randomness is not possible at all”. In information-retrieval parlance, null-hypothesis significance testing when applied to large corpora yields too many false hits. In this short discussion note I would like to do two things. First, I would like to make a few suggestions as to what I think are the most natural methodological consequences of Kilgarriff’s statement and several other points of critique concerning null-hypothesis significance testing raised in other disciplines. Second, I would like to revisit one of the examples Kilgarriff discusses in his paper to exemplify aspects of these proposals and show how the results bear on corpus-linguistic issues.


Archive | 2010

Dispersions and adjusted frequencies in corpora: further explorations

Stefan Th. Gries

In order to adjust observed frequencies of occurrence, previous studies have suggested a variety of measures of dispersion and adjusted frequencies. In a previous study, I reviewed many of these measures and suggested an alternative measure, DP (for ‘deviation of proportions’), which I argued to be conceptually simpler and more versatile than many competing measures. However, despite the relevance of dispersion for virtually all corpus-linguistic work, it is still a very much under-researched topic: to the best of my knowledge, there is not a single study investigating how different measures compare to each other when applied to large datasets, nor is there any work that attempts to determine how different measures match up with the kind of psycholinguistic data that dispersions and adjusted frequencies are supposed to represent. This article takes exploratory steps in both of these directions.


Language and Linguistics Compass | 2009

What is Corpus Linguistics

Stefan Th. Gries

Corpus linguistics is one of the fastest-growing methodologies in contemporary linguistics. In a conversational format, this article answers a few questions that corpus linguists regularly face from linguists who have not used corpus-based methods so far. It discusses some of the central assumptions (‘formal distributional differences reflect functional differences’), notions (corpora, representativity and balancedness, markup and annotation), and methods of corpus linguistics (frequency lists, concordances, collocations), and discusses a few ways in which the discipline still needs to mature. At a recent LSA meeting … [with an obvious bow to Frederick Newmeyer] Question: So, I hear you’re a corpus linguist. Interesting, I get to see more and more


Journal of Quantitative Linguistics | 2001

A Multifactorial Analysis of Syntactic Variation: Particle Movement Revisited

Stefan Th. Gries

The present paper investigates the word order alternation of English transitive phrasal verbs such as, e.g., to pick up the book versus to pick the book up. It builds on traditional monofactorial analyses, but argues that previously used methods of analysis are grossly inadequate to describe, explain and predict the word order choice by native speakers. A hypothesis integrating virtually all relevant variables ever postulated is proposed and investigated from a multifactorial perspective (using GLM, linear discriminant analysis and CART). As a result, more than 84% of native speakers’ choices can be predicted. Further implications (linguistic and methodological) are discussed.

Collaboration


Dive into the Stefan Th. Gries's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sandra C. Deshors

New Mexico State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge