Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paula Buttery is active.

Publication


Featured researches published by Paula Buttery.


English Profile Journal | 2010

Criterial Features in Learner Corpora: Theory and Illustrations

John A. Hawkins; Paula Buttery

One of the major goals of the Cambridge English Profile Programme is to identify ‘criterial features’ for each of the Common European Framework of Reference (CEFR) proficiency levels as they apply to English, and to assess the impact of different first languages on these features (through ‘transfer’ effects). The present paper defines what is meant by criterial features and proposes an initial taxonomy of four types. Numerous illustrations are given from our collaborative research to date on the Cambridge Learner Corpus. The benefits and challenges posed by these features for corpus linguistics and for theories of second language acquisition are briefly outlined, as are the benefits and challenges for language assessment practices and for publishing ventures that make use of them as supplements to the current CEFR descriptors.


Nucleic Acids Research | 2011

UKPMC: a full text article resource for the life sciences

Johanna McEntyre; Sophia Ananiadou; Stephen Andrews; William J. Black; Richard Boulderstone; Paula Buttery; David Chaplin; Sandeepreddy Chevuru; Norman Cobley; Lee Ann Coleman; Paul Davey; Bharti Gupta; Lesley Haji-Gholam; Craig Hawkins; Alan Horne; Simon J. Hubbard; Jee Hyub Kim; Ian Lewin; Vic Lyte; Ross MacIntyre; Sami Mansoor; Linda Mason; John McNaught; Elizabeth Newbold; Chikashi Nobata; Ernest Ong; Sharmila Pillai; Dietrich Rebholz-Schuhmann; Heather Rosie; Rob Rowbotham

UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first ‘mirror’ site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains ‘Cited By’ information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.


north american chapter of the association for computational linguistics | 2009

Biomedical Event Extraction without Training Data

Andreas Vlachos; Paula Buttery; Diarmuid Ó Séaghdha; Ted Briscoe

We describe our system for the BioNLP 2009 event detection task. It is designed to be as domain-independent and unsupervised as possible. Nevertheless, the precisions achieved for single theme event classes range from 75% to 92%, while maintaining reasonable recall. The overall F-scores achieved were 36.44% and 30.80% on the development and the test sets respectively.


PLOS ONE | 2015

Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms

Christian Bentz; Annemarie Verkerk; Douwe Kiela; Felix Hill; Paula Buttery

Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.


Corpus Linguistics and Linguistic Theory | 2014

Zipf's law and the grammar of languages: A quantitative study of Old and Modern English parallel texts

Christian Bentz; Douwe Kiela; Felix Hill; Paula Buttery

Abstract This paper reports a quantitative analysis of the relationship between word frequency distributions and morphological features in languages. We analyze a commonly-observed process of historical language change: The loss of inflected forms in favour of ‘analytic’ periphrastic constructions. These tendencies are observed in parallel translations of the Book of Genesis in Old English and Modern English. We show that there are significant differences in the frequency distributions of the two texts, and that parts of these differences are independent of total number of words, style of translation, orthography or contents. We argue that they derive instead from the trade-off between synthetic inflectional marking in Old English and analytic constructions in Modern English. By exploiting the earliest ideas of Zipf, we show that the syntheticity of the language in these texts can be captured mathematically, a property we tentatively call their grammatical fingerprint. Our findings suggest implications for both the specific historical process of inflection loss and more generally for the characterization of languages based on statistical properties.


Frontiers in Computational Neuroscience | 2015

Tracking cortical entrainment in neural activity: auditory processes in human temporal cortex.

Andrew Thwaites; Ian Nimmo-Smith; Elisabeth Fonteneau; Roy D. Patterson; Paula Buttery; William D. Marslen-Wilson

A primary objective for cognitive neuroscience is to identify how features of the sensory environment are encoded in neural activity. Current auditory models of loudness perception can be used to make detailed predictions about the neural activity of the cortex as an individual listens to speech. We used two such models (loudness-sones and loudness-phons), varying in their psychophysiological realism, to predict the instantaneous loudness contours produced by 480 isolated words. These two sets of 480 contours were used to search for electrophysiological evidence of loudness processing in whole-brain recordings of electro- and magneto-encephalographic (EMEG) activity, recorded while subjects listened to the words. The technique identified a bilateral sequence of loudness processes, predicted by the more realistic loudness-sones model, that begin in auditory cortex at ~80 ms and subsequently reappear, tracking progressively down the superior temporal sulcus (STS) at lags from 230 to 330 ms. The technique was then extended to search for regions sensitive to the fundamental frequency (F0) of the voiced parts of the speech. It identified a bilateral F0 process in auditory cortex at a lag of ~90 ms, which was not followed by activity in STS. The results suggest that loudness information is being used to guide the analysis of the speech stream as it proceeds beyond auditory cortex down STS toward the temporal pole.


Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition | 2007

I will shoot your shopping down and you can shoot all my tins---Automatic Lexical Acquisition from the CHILDES Database

Paula Buttery; Anna Korhonen

Empirical data regarding the syntactic complexity of childrens speech is important for theories of language acquisition. Currently much of this data is absent in the annotated versions of the childes database. In this perliminary study, we show that a state-of-the-art subcategorization acquisition system of Preiss et al. (2007) can be used to extract large-scale subcategorization (frequency) information from the (i) child and (ii) child-directed speech within the childes database without any domain-specific tuning. We demonstrate that the acquired information is sufficiently accurate to confirm and extend previously reported research findings. We also report qualitative results which can be used to further improve parsing and lexical acquisition technology for child language data in the future.


Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL) | 2014

Towards a computational model of grammaticalization and lexical diversity

Christian Bentz; Paula Buttery

Languages use dierent lexical inventories to encode information, ranging from small sets of simplex words to large sets of morphologically complex words. Grammaticalization theories argue that this variation arises as the outcome of diachronic processes whereby co-occurring words merge to one word and build up complex morphology. To model these processes we present a) a quantitative measure of lexical diversity and b) a preliminary computational model of changes in lexical diversity over several generations of merging higly frequent collocates.


Journal of Quantitative Linguistics | 2017

Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology

Christian Bentz; Dimitrios Alikaniotis; Tanja Samardžić; Paula Buttery

Abstract Word frequencies are central to linguistic studies investigating processing difficulty, learnability, age of acquisition, diachronic transmission and the relative weight given to a concept in society. However, there are few cross-linguistic studies on entire distributions of word frequencies, and even less on systematic changes within them. Here, we first define and test an exact measure for the relative difference between distributions – the Normalised Frequency Difference (NFD). We then apply this measure to parallel corpora in overall 19 languages, explaining systematic variation in the frequency distributions within the same language and across different languages. We further establish the NFD between lemmatised and un-lemmatised corpora as a frequency-based measure of inflectional productivity of a language. Finally, we argue that quantitative measures like the NFD can advance language typology beyond abstract, theory-driven expert judgments, towards more corpus-based, empirical and reproducible analyses.


text speech and dialogue | 2015

Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English

Russell Moore; Andrew Caines; Calbert Graham; Paula Buttery

This paper investigates the suitability of state-of-the-art natural language processing NLP tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment ALA and computer-assisted language learning CALL. Due to the non-canonical nature of spoken language containing filled pauses, non-standard grammatical variations, hesitations and other disfluencies and compounded by a lack of available training data, spoken language parsing has been a challenge for standard NLP tools. Recently the Redshift parser Honnibal et al. In: Proceedings of CoNLL 2013 has been shown to be successful in identifying grammatical relations and certain disfluencies in native speaker spoken language, returning unlabelled dependency accuracy of 90.5% and a disfluency F-measure of 84.1% Honnibal & Johnson: TACL 2, 131-142 2014. We investigate how this parser handles spoken data from learners of English at various proficiency levels. Firstly, we find that Redshifts parsing accuracy on non-native speech data is comparable to Honnibal & Johnsons results, with 91.1% of dependency relations correctly identified. However, disfluency detection is markedly down, with an F-measure of just 47.8%. We attempt to explain why this should be, and investigate the effect of proficiency level on parsing accuracy. We relate our findings to the use of NLP technology for CALL and ALA applications.

Collaboration


Dive into the Paula Buttery's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ted Briscoe

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Douwe Kiela

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Felix Hill

University of Cambridge

View shared research outputs
Researchain Logo
Decentralizing Knowledge