R. Harald Baayen
University of Tübingen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by R. Harald Baayen.
International journal of psychological research | 2010
R. Harald Baayen; Petar Milin
Reaction times (RTs) are an important source of information in experimental psychology. Classical methodological considerations pertaining to the statistical analysis of RT data are optimized for analyses of aggregated data, based on subject or item means (c.f., Forster & Dickinson, 1976). Mixed-effects modeling (see, e.g., Baayen, Davidson, & Bates, 2008) does not require prior aggregation and allows the researcher the more ambitious goal of predicting individual responses. Mixed-modeling calls for a reconsideration of the classical methodological strategies for analysing rts. In this study, we argue for empirical exibility with respect to the choice of transformation for the RTs. We advocate minimal a-priori data trimming, combined with model criticism. We also show how trial-to-trial, longitudinal dependencies between individual observations can be brought into the statistical model. These strategies are illustrated for a large dataset with a non-trivial random-effects structure. Special attention is paid to the evaluation of interactions involving fixed-effect factors that partition the levels sampled by random-effect factors.
Computers and The Humanities | 1998
Fiona J. Tweedie; R. Harald Baayen
A well-known problem in the domain of quantitative linguistics and stylistics concerns the evaluation of the lexical richness of texts. Since the most obvious measure of lexical richness, the vocabulary size (the number of different word types), depends heavily on the text length (measured in word tokens), a variety of alternative measures has been proposed which are claimed to be independent of the text length. This paper has a threefold aim. Firstly, we have investigated to what extent these alternative measures are truly textual constants. We have observed that in practice all measures vary substantially and systematically with the text length. We also show that in theory, only three of these measures are truly constant or nearly constant. Secondly, we have studied the extent to which these measures tap into different aspects of lexical structure. We have found that there are two main families of constants, one measuring lexical richness and one measuring lexical repetition. Thirdly, we have considered to what extent these measures can be used to investigate questions of textual similarity between and within authors. We propose to carry out such comparisons by means of the empirical trajectories of texts in the plane spanned by the dimensions of lexical richness and lexical repetition, and we provide a statistical technique for constructing confidence intervals around the empirical trajectories of texts. Our results suggest that the trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.
Psychological Review | 2011
R. Harald Baayen; Petar Milin; Dušica Filipović Đurđević; Peter Hendrix; Marco Marelli
A 2-layer symbolic network model based on the equilibrium equations of the Rescorla-Wagner model (Danks, 2003) is proposed. The study first presents 2 experiments in Serbian, which reveal for sentential reading the inflectional paradigmatic effects previously observed by Milin, Filipović Đurđević, and Moscoso del Prado Martín (2009) for unprimed lexical decision. The empirical results are successfully modeled without having to assume separate representations for inflections or data structures such as inflectional paradigms. In the next step, the same naive discriminative learning approach is pitted against a wide range of effects documented in the morphological processing literature. Frequency effects for complex words as well as for phrases (Arnon & Snider, 2010) emerge in the model without the presence of whole-word or whole-phrase representations. Family size effects (Moscoso del Prado Martín, Bertram, Häikiö, Schreuder, & Baayen, 2004; Schreuder & Baayen, 1997) emerge in the simulations across simple words, derived words, and compounds, without derived words or compounds being represented as such. It is shown that for pseudo-derived words no special morpho-orthographic segmentation mechanism, as posited by Rastle, Davis, and New (2004), is required. The model also replicates the finding of Plag and Baayen (2009) that, on average, words with more productive affixes elicit longer response latencies; at the same time, it predicts that productive affixes afford faster response latencies for new words. English phrasal paradigmatic effects modulating isolated word reading are reported and modeled, showing that the paradigmatic effects characterizing Serbian case inflection have crosslinguistic scope.
Trends in Cognitive Sciences | 2005
Jennifer Hay; R. Harald Baayen
Morphology is the study of the internal structure of words. A vigorous ongoing debate surrounds the question of how such internal structure is best accounted for: by means of lexical entries and deterministic symbolic rules, or by means of probabilistic subsymbolic networks implicitly encoding structural similarities in connection weights. In this review, we separate the question of subsymbolic versus symbolic implementation from the question of deterministic versus probabilistic structure. We outline a growing body of evidence, mostly external to the above debate, indicating that morphological structure is indeed intrinsically graded. By allowing probability into the grammar, progress can be made towards solving some long-standing puzzles in morphological theory.
Language and Cognitive Processes | 2000
Nivja H. De Jong; Robert Schreuder; R. Harald Baayen
It has been reported that in visual lexical decision response latencies to simplex nouns are shorter when these nouns have large morphological families, i.e., when they appear as constituents in large numbers of derived words and compounds. This study presents the results of four experiments that show that verbs have a Family Size effect independently of nominal conversion alternants, that this effect is a strict type frequency effect andnot a token frequency effect, that the effect is co-determined by the morphological structure of the inflected verb, and that it occurs irrespective of the orthographic shape of the base word.
Language Variation and Change | 2012
Sali A. Tagliamonte; R. Harald Baayen
What is the explanation for vigorous variation between was and were in plural existential constructions and what is the optimal tool for analyzing it? The standard variationist tool — the variable rule program — is a generalized linear model; however, recent developments in statistics have introduced new tools, including mixed-effects models, random forests and conditional inference trees. In a step-by-step demonstration, we show how this well known variable benefits from these complementary techniques. Mixed-effects models provide a principled way of assessing the importance of random-effect factors such as the individuals in the sample. Random forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate. Conditional inference trees straightforwardly visualize how multiple predictors operate in tandem. Taken together the results confirm that polarity, distance from verb to plural element and the nature of the DP are significant predictors. Ongoing linguistic change and social reallocation via morphologization are operational. Furthermore, the results make predictions that can be tested in future research. We conclude that variationist research can be substantially enriched by an expanded tool kit
Brain and Language | 2002
Mirjam Ernestus; R. Harald Baayen; Robert Schreuder
This article addresses the recognition of reduced word forms, which are frequent in casual speech. We describe two experiments on Dutch showing that listeners only recognize highly reduced forms well when these forms are presented in their full context and that the probability that a listener recognizes a word form in limited context is strongly correlated with the degree of reduction of the form. Moreover, we show that the effect of degree of reduction can only partly be interpreted as the effect of the intelligibility of the acoustic signal, which is negatively correlated with degree of reduction. We discuss the consequences of our findings for models of spoken word recognition and especially for the role that storage plays in these models.
Journal of the Acoustical Society of America | 2005
Mark Pluymaekers; Mirjam Ernestus; R. Harald Baayen
This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
Cognition | 2004
Fermín Moscoso del Prado Martín; Aleksandar Kostić; R. Harald Baayen
In this study we introduce an information-theoretical formulation of the emergence of type- and token-based effects in morphological processing. We describe a probabilistic measure of the informational complexity of a word, its information residual, which encompasses the combined influences of the amount of information contained by the target word and the amount of information carried by its nested morphological paradigms. By means of re-analyses of previously published data on Dutch words we show that the information residual outperforms the combination of traditional token- and type-based counts in predicting response latencies in visual lexical decision, and at the same time provides a parsimonious account of inflectional, derivational, and compounding processes.
Trends in linguistics. Studies and monographs ; 151 | 2003
R. Harald Baayen; Robert Schreuder
This volume brings together a series of studies of morphological processing in Germanic (English, German, Dutch), Romance (French, Italian), and Slavic (Polish, Serbian) languages. The question of how morphologically complex words are organized and processed in the mental lexicon is addressed from different theoretical perspectives (single and dual route models), for different modalities (auditory and visual comprehension, writing), and for language development. Experimental work is reported, as well as computational and statistical modeling. Thus, this volume provides a useful overview of the range of issues currently attracting reseach at the intersection of morphology and psycholinguistics.