Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Guthrie is active.

Publication


Featured researches published by David Guthrie.


conference of the european chapter of the association for computational linguistics | 2003

Mining web sites using adaptive information extraction

Alexiei Dingli; Fabio Ciravegna; David Guthrie; Yorick Wilks

Adaptive Information Extraction systems (IES) are currently used by some Semantic Web (SW) annotation tools as support to annotation (Handschuh et al., 2002; Vargas-Vera et al., 2002). They are generally based on fully supervised methodologies requiring fairly intense domain-specific annotation. Unfortunately, selecting representative examples may be difficult and annotations can be incorrect and require time. In this paper we present a methodology that drastically reduce (or even remove) the amount of manual annotation required when annotating consistent sets of pages. A very limited number of user-defined examples are used to bootstrap learning. Simple, high precision (and possibly high recall) IE patterns are induced using such examples, these patterns will then discover more examples which will in turn discover more patterns, etc.


Archive | 2013

Methods for Collection and Evaluation of Comparable Documents

Monica Lestari Paramita; David Guthrie; Evangelos Kanoulas; Robert J. Gaizauskas; Paul D. Clough; Mark Sanderson

Considerable attention is being paid to methods for gathering and evaluating comparable corpora, not only to improve Statistical Machine Translation (SMT) but for other applications as well, e.g. the extraction of paraphrases. The potential value of such corpora requires efficient and effective methods for gathering and evaluating them. Most of these methods have been tested in retrieving document pairs for well resourced languages, however there is a lack of work in areas of less popular (under resourced) languages, or domains. This chapter describes the work in developing methods for automatically gathering comparable corpora from the Web, specifically for under resourced languages. Different online sources are investigated and an evaluation method is developed to assess the quality of the retrieved documents.


international conference on advanced language processing and web information technology | 2007

Chinese Text Classification without Automatic Word Segmentation

Wei Liu; Ben Allison; David Guthrie; Louise Guthrie

Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words and short phrases as features, and examining the effect of document length on classification accuracy. Our experiments show that a naïve character bigram model of text performs as well as models generated using a state-of-the-art automatic segmenter.


north american chapter of the association for computational linguistics | 2004

Maximum entropy modeling in sparse semantic tagging

Jia Cui; David Guthrie

In this work, we are concerned with a coarse grained semantic analysis over sparse data, which labels all nouns with a set of semantic categories. To get the benefit of unlabeled data, we propose a bootstrapping framework with Maximum Entropy modeling (MaxEnt) as the statistical learning component. During the iterative tagging process, unlabeled data is used not only for better statistical estimation, but also as a medium to integrate non-statistical knowledge into the model training. Two main issues are discussed in this paper. First, Association Rule principles are suggested to guide MaxEnt feature selections. Second, to guarantee the convergence of the boot-strapping process, three adjusting strategies are proposed to soft tag unlabeled data.


language resources and evaluation | 2006

A Closer Look at Skip-gram Modelling.

David Guthrie; Ben Allison; Wei Liu; Louise Guthrie; Yorick Wilks


international joint conference on artificial intelligence | 2003

Integrating information to bootstrap information extraction from web sites

Fabio Ciravegna; Alexiei Dingli; David Guthrie; Yorick Wilks


international joint conference on artificial intelligence | 2007

Unsupervised anomaly detection

David Guthrie; Louise Guthrie; Ben Allison; Yorick Wilks


empirical methods in natural language processing | 2010

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval

David Guthrie; Mark Hepple


conference of the european chapter of the association for computational linguistics | 2003

Mining Web Sites Using Unsupervised Adaptive Information Extraction

Alexiei Dingli; Fabio Ciravegna; David Guthrie; Yorick Wilks


meeting of the association for computational linguistics | 2006

Towards the Orwellian Nightmare: Separation of Business and Personal Emails

Sanaz Jabbari; Ben Allison; David Guthrie; Louise Guthrie

Collaboration


Dive into the David Guthrie's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yorick Wilks

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Ben Allison

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mark Hepple

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Wei Liu

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jia Cui

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge