Andreea S. Calude
University of Waikato
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andreea S. Calude.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Mark Pagel; Quentin D. Atkinson; Andreea S. Calude; Andrew Meade
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
Philosophical Transactions of the Royal Society B | 2011
Andreea S. Calude; Mark Pagel
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the worlds 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a words position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the worlds language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.
Journal of Linguistics | 2014
Martin Haspelmath; Andreea S. Calude; Michael Spagnol; Heiko Narrog; Elif Bamyacı
We propose, and provide corpus-based support for, a usage-based explanation for cross-linguistic trends in the coding of causal–noncausal verb pairs, such as raise/rise, break (tr.)/ break (intr.). While English mostly uses the same verb form both for the causal and the noncausal sense (labile coding), most languages have extra coding for the causal verb (causative coding) and/or for the noncausal verb (anticausative coding). Causative and anticausative coding is not randomly distributed (Haspelmath 1993): Some verb meanings, such as ‘freeze’, ‘dry’ and ‘melt’, tend to be coded as causatives, while others, such as ‘break’, ‘open’ and ‘split’, tend to be coded as anticausatives. We propose an explanation of these coding tendencies on the basis of the form–frequency correspondence principle, which is a general efficiency principle that is responsible for many grammatical asymmetries, ultimately grounded in predictability of frequently expressed meanings. In corpus data from seven languages, we find that verb pairs for which the noncausal member is more frequent tend to be coded as anticausatives, while verb pairs for which the causal member is more frequent tend to be coded as causatives. Our approach implies that linguists should not rely on form–meaning parallelism when trying to explain cross-linguistic or language-particular patterns in this domain.
International Journal of Bilingual Education and Bilingualism | 2015
Jeanine Treffers-Daller; Andreea S. Calude
Learning to talk about motion in a second language is very difficult because it involves restructuring deeply entrenched patterns from the first language. In this paper we argue that statistical learning can explain why L2 learners are only partially successful in restructuring their second language grammars. We explore to what extent L2 learners make use of two mechanisms of statistical learning, entrenchment and pre-emption to acquire target-like expressions of motion and retreat from overgeneralisation in this domain. Paying attention to the frequency of existing patterns in the input can help learners to adjust the frequency with which they use path and manner verbs in French but is insufficient to acquire the boundary crossing constraint and learn what not to say. We also look at the role of language proficiency and exposure to French in explaining the findings.
Corpus Linguistics and Linguistic Theory | 2017
Andreea S. Calude; Steven Miller; Mark Pagel
Abstract Loanword use has dominated the literature on language contact and its salient nature continues to draw interest from linguists and non-linguists. Traditionally, loanwords were investigated by means of raw frequencies, which are at best uninformative and at worst misleading. Following a new wave of studies which look at loans from a quantitatively more informed standpoint, modelling “success” by taking into account frequency of the counterparts available in the language adopting the loanwords, we propose a similar model of loan-use and demonstrate its benefits in a case study of loanwords from Māori into (New Zealand) English. Our model contributes to previous work in this area by combining both the success measure mentioned above with a rich range of linguistic characteristics of the loanwords (such as loan length and word class), as well as a similarly detailed group of sociolinguistic characteristics of the speakers using them (gender, age and ethnicity of both, speakers and addresses). Our model is unique in bringing together of all these factors at the same time. The findings presented here illustrate the benefit of a quantitatively balanced approach to modelling loanword use. Furthermore, they illustrate the complex interaction between linguistic and sociolinguistic factors in such language contact scenarios.
Australian Journal of Linguistics | 2017
Andreea S. Calude
continue this new direction of investigation by adding languages and extending the research to Köhlerian motifs, phrases, clauses, strophes, Belza-chains, paragraphs, rhetorical units, poetic figures, etc. Considering the large number of previous efforts to capture lengths of individual units in individual languages, the above unifying approach can be viewed as welcome scientific progress. The authors apply a continuous function to discrete lengths instead of using a discrete distribution. This is justified by the fact that discrete and continuous are merely different concepts for describing the same phenomenon. It must be emphasized that any linguistic unit is only an adequate definition adapted to a context, e.g. in spoken language there are no phonemes but only sounds, no sentences but only utterances. Every determination of a theoretical linguistic unit is a problem of definition, i.e. a way of looking at reality—not part of reality itself. It is to be hoped that the approach taken in this book will influence the scientific view of linguistics. It would be very useful if one could also find unifying models for other linguistic properties, but this is a future challenge. Unified Modeling of Length in Language is a pioneering work and it is to be expected that it will find a number of followers. The mathematics is kept at an elementary level and the fitting of the theoretical functions to the observed data is performed by means of software specifically developed for this purpose (Fitter for discrete distributions and NLREG and TableCurves for continuous functions).
Journal of Linguistics | 2014
Martin Haspelmath; Andreea S. Calude; Michael Spagnol; Heiko Narrog; Eli˙F Bamyaci
The PENULTIMATE SENTENCE OF THE ABSTRACT of the article by Haspelmath et al. (2014), ‘In corpus data from seven languages, we find that verb pairs for which the noncausal member is more frequent tend to be coded as anticausatives, while verb pairs for which the causal member is more frequent tend to be coded as causatives’, should read: In corpus data from seven languages, we find that verb pairs for which the causal member is more frequent tend to be coded as anticausatives, while verb pairs for which the noncausal member is more frequent tend to be coded as causatives.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Mark Pagel; Quentin D. Atkinson; Andreea S. Calude; Andrew Meade
Mahowald and Gibson (1) suggest that the shorter word length of frequently used words, and not their stability, could mean that chance sound correspondences account for the pattern of results we report for cognate relationships among proto-words in our study of seven Eurasian language families (2). However, their −0.24 correlation between the phonological length of contemporary English words and our measure of cognate class size is not relevant to the question of chance sound correspondences among the proto-words from different language families. What must be demonstrated is that shorter proto-words were more likely to be judged cognate simply on the basis of their length.
Journal of Language Evolution | 2016
Andreea S. Calude; Annemarie Verkerk
Archive | 2014
Andreea S. Calude; Mark Pagel