Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Clare Llewellyn is active.

Publication


Featured researches published by Clare Llewellyn.


international conference theory and practice digital libraries | 2015

Extracting a Topic Specific Dataset from a Twitter Archive

Clare Llewellyn; Claire Grover; Beatrice Alex; Jon Oberlander; Richard Tobin

Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term ‘syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags from the set as search terms, using a hand selected set of hashtags as search terms and using LDA topic modelling to cluster tweets and selecting appropriate clusters. We describe an evaluation method for accessing the relevance and accuracy of the tweets returned.


acm/ieee joint conference on digital libraries | 2006

A curated harvesting approach to establishing a multi-protocol online subject portal

Clare Llewellyn; John Harrison; Robert Sanderson

We describe a curated harvesting approach to creating and maintaining a subject portal, comprising selected records harvested from remote services via information retrieval standards such as SRU, Z39.50 and OAI-PMH. The result was a Web-based data curation interface where administrative users can configure access to remote resources, queries to be performed at them, and review records for inclusion in end user searches


meeting of the association for computational linguistics | 2016

Improving Topic Model Clustering of Newspaper Comments for Summarisation

Clare Llewellyn; Claire Grover; Jon Oberlander

Online newspaper articles can accumulate comments at volumes that prevent close reading. Summarisation of the comments allows interaction at a higher level and can lead to an understanding of the overall discussion. Comment summarisation requires topic clustering, comment ranking and extraction. Clustering must be robust as the subsequent extraction relies on a good set of clusters. Comment data, as with many social media datasets, contains very short documents and the number of words in the documents is a limiting factors on the performance of LDA clustering. We evaluate whether we can combine comments to form larger documents to improve the quality of clusters. We find that combining comments with comments that reply to them produce the highest quality clusters.


theory and practice of digital libraries | 2012

Enhancing the curation of botanical data using text analysis tools

Clare Llewellyn; Claire Grover; Jon Oberlander; Elspeth Haston

Automatic text analysis tools have significant potential to improve the productivity of those who organise large collections of data. However, to be effective, they have to be both technically efficient and provide a productive interaction with the user. Geographic referencing of historical botanical data is difficult, time consuming and relies heavily on the expertise of the curators. Botanical specimens that have poor quality labelling are often disregarded and the information is lost. This work highlights how the use of automated analysis methods can be used to assist in the curation of a botanical specimen library.


acm/ieee joint conference on digital libraries | 2009

Evaluation of OAI-ORE via large-scale information topology visualization

Robert Sanderson; Clare Llewellyn; Richard Jones

This poster evaluates the OAI-ORE specifications through experiments providing access to the JSTOR digital archive and the Flickr website. A browser-based dynamic graph visualization tool was designed and tested to determine if making the topology of the information available would provide end-user benefits in terms of navigation and discovery.


acm ieee joint conference on digital libraries | 2018

Russian Troll Hunting in a Brexit Twitter Archive

Clare Llewellyn; Laura Cram; Adrian Favero; Robin L. Hill

Twitter has identified 2,752 accounts that it believes are linked to the Internet Research Agency (IRA), a Russian company that creates online propaganda. These accounts are known to have tweeted about the US 2016 Elections and the list was submitted as evidence by Twitter to the United States Senate Judiciary Subcommittee on Crime and Terrorism. There is no equivalent officially published list of accounts from the IRA known to be active in the UK-EU Referendum debate (Brexit), but we found that the troll accounts active on the 2016 US Election also produced content related to Brexit. We found 3,485 tweets from 419 of the accounts listed as IRA accounts which specifically discussed Brexit and related topics such as the EU and migration. We have been collating an archive of tweets related to Brexit since August 2015 and currently have over 70 million tweets. The Brexit referendum took place on the 23rd June 2016 and the UK voted to leave the European Union. We gathered the data using the Twitter API and a selection of hashtags chosen by a panel of academic experts. Currently we have in excess of fifty different hashtags and we add to the set periodically to accurately represent the evolving conversation. Twitter has closed the accounts that were documented in the Senate list meaning that these tweets are no longer available through the webpage or API. Due to Twitters terms of service we are unable to share specific tweet text or user profile information but our findings, utilising text and metadata from derived and aggregated data, allows us to provide important insights into the behaviour of these trolls.


acm/ieee joint conference on digital libraries | 2016

Avoiding the Drunkard's Search: Investigating Collection Strategies for Building a Twitter Dataset

Clare Llewellyn; Laura Cram; Adrian Favero

We investigate methods for collecting data to form an archive on the debate within Twitter surrounding the UKs inclusion in the EU. We use three strategies, gathering data using hashtags, extracting data from the random stream and collecting from users known to be discussing the debate. We explore the various bias in the resulting datasets.


Archive | 2016

User-Driven Text Mining of Historical Text

Beatrice Alex; Claire Grover; Ewan Klein; Clare Llewellyn; Richard Tobin

Abstract This chapter presents a summary of work on text mining (TM) of historical documents for the discovery of 19th century trade in the British Empire as part of the Digging into Data ( http://www.diggingintodata.org ) project TRADING CONSEQUENCES ( http://tradingconsequences.blogs.edina.ac.uk ). The project aimed to assist environmental historians in understanding the economic and environmental consequences of commodity trading during the 19th century. We applied TM to large quantities of historical text, converting unstructured textual information into structured data. The structured data was used to populate a relational database that is in turn the back end for querying and different types of online visualisations. We will discuss some of the challenges involved when processing digitised historical text which originally appeared in printed form.


acm/ieee joint conference on digital libraries | 2014

Building a dataset of sensitive information

Clare Llewellyn; Laine Ruus; Ros Burnett; Steve Kirkwood; Mark A. Smith; Rocio von-Jungenfeld

Using text analysis tools to study large data sets is currently an area of popular interest. Prompted by the success of several big data research initiatives, researchers from a variety of disciplines wish to gather and analyse textual data. Communication between members of diverse teams can present a problem and developing a shared language and understanding of the task is necessary.


international conference on weblogs and social media | 2014

Summarizing Newspaper Comments

Clare Llewellyn; Claire Grover; Jon Oberlander

Collaboration


Dive into the Clare Llewellyn's collaboration.

Top Co-Authors

Avatar

Laura Cram

University of Strathclyde

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert Sanderson

Los Alamos National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge