Miles Osborne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miles Osborne is active.

Explore More

Publication

Featured researches published by Miles Osborne.

language and technology conference | 2006

Improved Statistical Machine Translation Using Paraphrases

Chris Callison-Burch; Philipp Koehn; Miles Osborne

Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.

north american chapter of the association for computational linguistics | 2003

Example selection for bootstrapping statistical parsers

Mark Steedman; Rebecca Hwa; Stephen Clark; Miles Osborne; Anoop Sarkar; Julia Hockenmaier; Paul Ruhlen; Steven Baker; Jeremiah Crim

This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, cotraining, in which two parsers are iteratively re-trained on each others output; and a semi-supervised approach, corrected co-training, in which a human corrects each parsers output before adding it to the training data. The selection of labeled training examples is an integral part of both frameworks. We propose several selection methods based on the criteria of minimizing errors in the data and maximizing training utility. We show that incorporating the utility criterion into the selection method results in better parsers for both frameworks.

conference of the european chapter of the association for computational linguistics | 2003

Bootstrapping statistical parsers from small datasets

Mark Steedman; Miles Osborne; Anoop Sarkar; Stephen Clark; Rebecca Hwa; Julia Hockenmaier; Paul Ruhlen; Steven Baker; Jeremiah Crim

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of boot-strapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that boot-strapping continues to be useful, even though no manually produced parses from the target domain are used.

meeting of the association for computational linguistics | 2004

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

Chris Callison-Burch; David Talbot; Miles Osborne

The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.

north american chapter of the association for computational linguistics | 2003

Bootstrapping POS taggers using unlabelled data

Stephen Clark; James R. Curran; Miles Osborne

This paper investigates booststrapping part-of-speech taggers using co-training, in which two taggers are iteratively re-trained on each others output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to add to the training set. We investigate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training can significantly improve tagging performance for small seed datasets. Further results show that this form of co-training considerably outperforms self-training. However, we find that simply re-training on all the newly labelled data can, in some cases, yield comparable results to agreement-based co-training, with only a fraction of the computational cost.

meeting of the association for computational linguistics | 2002

Using maximum entropy for sentence extraction

Miles Osborne

A maximum entropy classifier can be used to extract sentences from documents. Experiments using technical documents show that such a classifier tends to treat features in a categorical manner. This results in performance that is worse than when extracting sentences using a naive Bayes classifier. Addition of an optimised prior to the maximum entropy classifier improves performance over and above that of naive Bayes (even when naive Bayes is also extended with a similar prior). Further experiments show that, should we have at our disposal extremely informative features, then maximum entropy is able to yield excellent results. Naive Bayes, in contrast, cannot exploit these features and so fundamentally limits sentence extraction performance.

international joint conference on natural language processing | 2009

A Gibbs Sampler for Phrasal Synchronous Grammar Induction

Phil Blunsom; Trevor Cohn; Chris Dyer; Miles Osborne

We present a phrasal synchronous grammar model of translational equivalence. Unlike previous approaches, we do not resort to heuristics or constraints from a word-alignment model, but instead directly induce a synchronous grammar from parallel sentence-aligned corpora. We use a hierarchical Bayesian prior to bias towards compact grammars with small translation units. Inference is performed using a novel Gibbs sampler over synchronous derivations. This sampler side-steps the intractability issues of previous models which required inference over derivation forests. Instead each sampling iteration is highly efficient, allowing the model to be applied to larger translation corpora than previous approaches.

workshop on statistical machine translation | 2007

CCG Supertags in Factored Statistical Machine Translation

Alexandra Birch; Miles Osborne; Philipp Koehn

Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in the quality of translation and that the value of syntactic supertags in flat structured phrase-based models is largely due to better local reorderings.

international acm sigir conference on research and development in information retrieval | 2013

Who will retweet me?: finding retweeters in twitter

Zhunchen Luo; Miles Osborne; Jintao Tang; Ting Wang

An important aspect of communication in Twitter (and other Social Network is message propagation -- people creating posts for others to share. Although there has been work on modelling how tweets in Twitter are propagated (retweeted), an untackled problem has been who will retweet a message. Here we consider the task of finding who will retweet a message posted on Twitter. Within a learning to-rank framework, we explore a wide range of features, such as retweet history, followers status, followers active time and followers interests. We find that followers who retweeted or mentioned the authors tweets frequently before and have common interests are more likely to be retweeters.

Journal of Machine Learning Research | 2002

Introduction to special issue on machine learning approaches to shallow parsing

James Hammerton; Miles Osborne; Susan Armstrong; Walter Daelemans

This article introduces the problem of partial or shallow parsing (assigning partial syntactic structure to sentences) and explains why it is an important natural language processing (NLP) task. The complexity of the task makes Machine Learning an attractive option in comparison to the handcrafting of rules. On the other hand, because of the same task complexity, shallow parsing makes an excellent benchmark problem for evaluating machine learning algorithms. We sketch the origins of shallow parsing as a specific task for machine learning of language, and introduce the articles accepted for this special issue, a representative sample of current research in this area. Finally, future directions for machine learning of shallow parsing are suggested.

Explore More