Beatrice Alex | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Beatrice Alex is active.

Explore More

Publication

Featured researches published by Beatrice Alex.

pacific symposium on biocomputing | 2007

Assisted Curation: Does Text Mining Really Help?

Beatrice Alex; Claire Grover; Barry Haddow; Mijail Kabadjov; Ewan Klein; Michael Matthews; Stuart Roebuck; Richard Tobin; Xinglong Wang

Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

meeting of the association for computational linguistics | 2007

Recognising Nested Named Entities in Biomedical Text

Beatrice Alex; Barry Haddow; Claire Grover

Although recent named entity (NE) annotation efforts involve the markup of nested entities, there has been limited focus on recognising such nested structures. This paper introduces and compares three techniques for modelling and recognising nested entities by means of a conventional sequence tagger. The methods are tested and evaluated on two biomedical data sets that contain entity nesting. All methods yield an improvement over the baseline tagger that is only trained on flat annotation.

Genome Biology | 2008

Automating curation using a natural language processing pipeline

Beatrice Alex; Claire Grover; Barry Haddow; Mijail Kabadjov; Ewan Klein; Michael Matthews; Richard Tobin; Xinglong Wang

Background:The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general.Results:Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average.Conclusion:The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.

Digital Scholarship in the Humanities | 2015

Trading Consequences: A Case Study of Combining Text Mining and Visualization to Facilitate Document Exploration

Uta Hinrichs; Beatrice Alex; Jim Clifford; Andrew Watson; Aaron J. Quigley; Ewan Klein; Colin M. Coates

Large-scale digitization efforts and the availability of computational methods, including text mining and information visualization, have enabled new approaches to historical research. However, we lack case studies of how these methods can be applied in practice and what their potential impact may be. Trading Consequences is an interdisciplinary research project between environmental historians, computational linguists, and visualization specialists. It combines text mining and information visualization alongside traditional research methods in environmental history to explore commodity trade in the 19th century from a global perspective. Along with a unique data corpus, this project developed three visual interfaces to enable the exploration and analysis of four historical document collections, consisting of approximately 200,000 documents and 11 million pages related to commodity trading. In this article, we discuss the potential and limitations of our approach based on feedback from historians we elicited over the course of this project. Informing the design of such tools in the larger context of digital humanities projects, our findings show that visualization-based interfaces are a valuable starting point to large-scale explorations in historical research. Besides providing multiple visual perspectives on the document collection to highlight general patterns, it is important to provide a context in which these patterns occur and offer analytical tools for more in-depth investigations.

International Journal of Humanities and Arts Computing | 2015

Adapting the Edinburgh Geoparser for Historical Georeferencing

Beatrice Alex; Kate Byrne; Claire Grover; Richard Tobin

Place name mentions in text may have more than one potential referent (e.g. Peru, the country vs. Peru, the city in Indiana). The Edinburgh Language Technology Group (LTG) has developed the Edinburgh Geoparser, a system that can automatically recognise place name mentions in text and disambiguate them with respect to a gazetteer. The recognition step is required to identify location mentions in a given piece of text. The subsequent disambiguation step, generally referred to as georesolution, grounds location mentions to their corresponding gazetteer entries with latitude and longitude values, for example, to visualise them on a map. Geoparsing is not only useful for mapping purposes but also for making document collections more accessible as it can provide additional metadata about the geographical content of documents. Combined with other information mined from text such as person names and date expressions, complex relations between such pieces of information can be identified. The Edinburgh Geoparser c...

meeting of the association for computational linguistics | 2005

An Unsupervised System for Identifying English Inclusions in German Text

Beatrice Alex

We present an unsupervised system that exploits linguistic knowledge resources, namely English and German lexical databases and the World Wide Web, to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classification results of our system and compare them to the performance of a trained machine learner in a series of in- and cross-domain experiments.

Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage | 2014

Estimating and rating the quality of optically character recognised text

Beatrice Alex; John Burns

The focus of this paper is on the quality of historical text digitised through optical character recognition (OCR) and how it affects text mining. We study the effect OCR errors have on named entity recognition (NER) and show that in a random sample of documents picked from several historical text collections, 30.6% of false negative commodity and location mentions and 13.3% of all manually annotated commodity and location mentions contain OCR errors. We introduce a simple method for estimating text quality of OCRed text and examine how well human raters can evaluate it. We also illustrate how automatic text quality estimation compares to manual rating with the aim of determining a quality threshold below which documents could potentially be discarded or would require extensive correction first. This work was conducted during the Trading Consequences project which focussed on text mining and visualisation of historical documents for the study of nineteenth century trade.

international conference theory and practice digital libraries | 2015

Extracting a Topic Specific Dataset from a Twitter Archive

Clare Llewellyn; Claire Grover; Beatrice Alex; Jon Oberlander; Richard Tobin

Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term ‘syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags from the set as search terms, using a hand selected set of hashtags as search terms and using LDA topic modelling to cluster tweets and selecting appropriate clusters. We describe an evaluation method for accessing the relevance and accuracy of the tweets returned.

cross language evaluation forum | 2004

Cross-lingual question answering using off-the-shelf machine translation

Kisuh Ahn; Beatrice Alex; Johan Bos; Tiphaine Dalmas; Jochen L. Leidner; Matthew Smillie

We show how to adapt an existing monolingual open-domain QA system to perform in a cross-lingual environment, using off-the-shelf machine translation software. In our experiments we use French and German as source language, and English as target language. For answering factoid questions, our system performs with an accuracy of 16% (German to English) and 20% (French to English), respectively. The loss of correctly answered questions caused by the MT component is estimated at 10% for French, and 15% for German. The accuracy of our system on correctly translated questions is 28% for German and 29% for French.

Historical methods: A journal of quantitative and interdisciplinary history | 2016

Geoparsing history: Locating commodities in ten million pages of nineteenth-century sources

Jim Clifford; Beatrice Alex; Colin M. Coates; Ewan Klein; Andrew Watson

ABSTRACT In the Trading Consequences project, historians, computational linguists, and computer scientists collaborated to develop a text mining system that extracts information from a vast amount of digitized published English-language sources from the “long nineteenth century” (1789 to 1914). The project focused on identifying relationships within the texts between commodities, geographical locations, and dates. The authors explain the methodology, uses, and the limitations of applying digital humanities techniques to historical research, and they argue that interdisciplinary approaches are critically important in addressing the technical challenges that arise. Collaborative teamwork of the kind described here has considerable potential to produce further advances in the large-scale analysis of historical documents.

Explore More