Andrew Caines
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew Caines.
text speech and dialogue | 2015
Russell Moore; Andrew Caines; Calbert Graham; Paula Buttery
This paper investigates the suitability of state-of-the-art natural language processing NLP tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment ALA and computer-assisted language learning CALL. Due to the non-canonical nature of spoken language containing filled pauses, non-standard grammatical variations, hesitations and other disfluencies and compounded by a lack of available training data, spoken language parsing has been a challenge for standard NLP tools. Recently the Redshift parser Honnibal et al. In: Proceedings of CoNLL 2013 has been shown to be successful in identifying grammatical relations and certain disfluencies in native speaker spoken language, returning unlabelled dependency accuracy of 90.5% and a disfluency F-measure of 84.1% Honnibal & Johnson: TACL 2, 131-142 2014. We investigate how this parser handles spoken data from learners of English at various proficiency levels. Firstly, we find that Redshifts parsing accuracy on non-native speech data is comparable to Honnibal & Johnsons results, with 91.1% of dependency relations correctly identified. However, disfluency detection is markedly down, with an F-measure of just 47.8%. We attempt to explain why this should be, and investigate the effect of proficiency level on parsing accuracy. We relate our findings to the use of NLP technology for CALL and ALA applications.
recent advances in intrusion detection | 2018
Sergio Pastrana; Alice Hutchings; Andrew Caines; Paula Buttery
Underground forums contain many thousands of active users, but the vast majority will be involved, at most, in minor levels of deviance. The number who engage in serious criminal activity is small. That being said, underground forums have played a significant role in several recent high-profile cybercrime activities. In this work we apply data science approaches to understand criminal pathways and characterize key actors related to illegal activity in one of the largest and longest-running underground forums. We combine the results of a logistic regression model with k-means clustering and social network analysis, verifying the findings using topic analysis. We identify variables relating to forum activity that predict the likelihood a user will become an actor of interest to law enforcement, and would therefore benefit the most from intervention. This work provides the first step towards identifying ways to deter the involvement of young people away from a career in cybercrime.
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground | 2010
Andrew Caines; Paula Buttery
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages | 2014
Andrew Caines; Paula Buttery
Archive | 2012
Paula Buttery; Andrew Caines
language resources and evaluation | 2016
Andrew Caines; Christian Bentz; Calbert Graham; Tim Polzehl; Paula Buttery
language resources and evaluation | 2016
Wanru Zhang; Andrew Caines; Dimitrios Alikaniotis; Paula Buttery
international conference on computational linguistics | 2016
Russell Moore; Andrew Caines; Calbert Graham; Paula Buttery
workshop on innovative use of nlp for building educational applications | 2017
Andrew Caines; Emma Flint; Paula Buttery
empirical methods in natural language processing | 2017
Andrew Caines; Michael McCarthy; Paula Buttery