Alistair Baron
Lancaster University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alistair Baron.
IEEE Computer | 2013
Awais Rashid; Alistair Baron; Paul Rayson; Corinne May-Chahal; Phil Greenwood; James Walkerdine
The Isis toolkit offers the sophisticated capabilities required to analyze digital personas and provide investigators with clues to the identity of the individual or group hiding behind one or more personas.
Transactions in Gis | 2015
Patricia Murrieta-Flores; Alistair Baron; Ian N. Gregory; Andrew Hardie; Paul Rayson
The aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues of research to analyze large textual collections in the Humanities, particularly in historical research. Using as examples parts of the collection of the Registrar Generals Reports that contain more than 200,000 pages of descriptions, census data and vital statistics for the UK, we introduce newly developed automated textual tools and well known spatial analyses used in combination to investigate a case study of the references made to cholera and other diseases in these historical sources, and their relationship to place-names during Victorian times. The integration of such techniques has allowed us to explore, in an automatic way, this historical source containing millions of words, to examine the geographies depicted in it, and to identify textual and geographic patterns in the corpus.
ICAME Journal | 2015
Dawn Archer; Merja Kytö; Alistair Baron; Paul Rayson
Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.
Computers & Security | 2017
Matthew John Edwards; Robert Larson; Benjamin Green; Awais Rashid; Alistair Baron
The process of social engineering targets people rather than IT infrastructure. Attackers use deceptive ploys to create compelling behavioural and cosmetic hooks, which in turn lead a target to disclose sensitive information or to interact with a malicious payload. The creation of such hooks requires background information on targets. Individuals are increasingly releasing information about themselves online, particularly on social networks. Though existing research has demonstrated the social engineering risks posed by such open source intelligence, this has been accomplished either through resource-intensive manual analysis or via interactive information harvesting techniques. As manual analysis of large-scale online information is impractical, and interactive methods risk alerting the target, alternatives are desirable. In this paper, we demonstrate that key information pertinent to social engineering attacks on organisations can be passively harvested on a large-scale in an automated fashion. We address two key problems. We demonstrate that it is possible to automatically identify employees of an organisation using only information which is visible to a remote attacker as a member of the public. Secondly, we show that, once identified, employee profiles can be linked across multiple online social networks to harvest additional information pertinent to successful social engineering attacks. We further demonstrate our approach through analysis of the social engineering attack surface of real critical infrastructure organisations. Based on our analysis we propose a set of countermeasures including an automated social engineering vulnerability scanner that organisations can use to analyse their exposure to potential social engineering attacks arising from open source intelligence.
Digital Scholarship in the Humanities | 2015
Marc Alexander; Fraser Dallachy; Scott Piao; Alistair Baron; Paul Rayson
The use of metaphor in popular science is widespread to aid readers’ conceptions of the scientific concepts under discussion. Almost all research in this area has been done by careful close reading of the text(s) in question, but this article describes—for the first time—a digital ‘distant reading’ analysis of popular science, using a system created by a team from Glasgow and Lancaster. This team, as part of the SAMUELS project, has developed semantic tagging software which is based upon the UCREL Semantic Analysis System developed by Lancaster University’s University Centre for Computer Corpus Research on Language, but using the uniquely comprehensive Historical Thesaurus of English (published in 2009 as The Historical Thesaurus of the Oxford English Dictionary) as its knowledge base, in order to provide fine-grained meaning distinctions for use in word-sense disambiguation. In addition to analyzing metaphors in highly abstract book-length popular science texts from physics and mathematics, this article describes the technical underpinning to the system and the methods employed to hone the word-sense disambiguation procedure.
Computer Speech & Language | 2017
Scott Piao; Fraser Dallachy; Alistair Baron; Jane Demmen; Steve Wattam; Philip Durkin; James McCracken; Paul Rayson; Marc Alexander
Automatic extraction and analysis of meaning-related information from natural language data has been an important issue in a number of research areas, such as natural language processing (NLP), text mining, corpus linguistics, and data science. An important aspect of such information extraction and analysis is the semantic annotation of language data using a semantic tagger. In practice, various semantic annotation tools have been designed to carry out different levels of semantic annotation, such as topics of documents, semantic role labeling, named entities or events. Currently, the majority of existing semantic annotation tools identify and tag partial core semantic information in language data, but they tend to be applicable only for modern language corpora. While such semantic analyzers have proven useful for various purposes, a semantic annotation tool that is capable of annotating deep semantic senses of all lexical units, or all-words tagging, is still desirable for a deep, comprehensive semantic analysis of language data. With large-scale digitization efforts underway, delivering historical corpora with texts dating from the last 400 years, a particularly challenging aspect is the need to adapt the annotation in the face of significant word meaning change over time. In this paper, we report on the development of a new semantic tagger (the Historical Thesaurus Semantic Tagger), and discuss challenging issues we faced in this work. This new semantic tagger is built on existing NLP tools and incorporates a large-scale historical English thesaurus linked to the Oxford English Dictionary. Employing contextual disambiguation algorithms, this tool is capable of annotating lexical units with a historically-valid highly fine-grained semantic categorization scheme that contains about 225,000 semantic concepts and 4,033 thematic semantic categories. In terms of novelty, it is adapted for processing historical English data, with rich information about historical usage of words and a spelling variant normalizer for historical forms of English. Furthermore, it is able to make use of knowledge about the publication date of a text to adapt its output. In our evaluation, the system achieved encouraging accuracies ranging from 77.12% to 91.08% on individual test texts. Applying time-sensitive methods improved results by as much as 3.54% and by 1.72% on average.
Computers & Security | 2016
William Knowles; Alistair Baron; Tim McGarr
Simulated security assessments (a collective term used here for penetration testing, vulnerability assessment, and related nomenclature) may need standardisation, but not in the commonly assumed manner of practical assessment methodologies. Instead, this study highlights market failures within the providing industry at the beginning and ending of engagements, which has left clients receiving ambiguous and inconsistent services. It is here, at the prior and subsequent phases of practical assessments, that standardisation may serve the continuing professionalisation of the industry, and provide benefits not only to clients but also to the practitioners involved in the provision of these services. These findings are based on the results of 54 stakeholder interviews with providers of services, clients, and coordinating bodies within the industry. The paper culminates with a framework for future advancement of the ecosystem, which includes three recommendations for standardisation.
Archive | 2008
Alistair Baron; Paul Rayson
Archive | 2007
Paul Rayson; Dawn Archer; Alistair Baron; Jonathan Culpeper; Nicholas Smith
Proceedings of the Corpus Linguistics Conference 2009 (CL2009),, 2009, pág. 314 | 2009
Alistair Baron; Paul Rayson