John A. Carroll | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John A. Carroll is active.

Explore More

Publication

Featured researches published by John A. Carroll.

meeting of the association for computational linguistics | 2006

The Second Release of the RASP System

Ted Briscoe; John A. Carroll; Rebecca Watson

We describe the new release of the RASP (robust accurate statistical parsing) system, designed for syntactic annotation of free text. The new version includes a revised and more semantically-motivated output representation, an enhanced grammar and part-of-speech tagger lexicon, and a more flexible and semi-supervised training method for the structural parse ranking model. We evaluate the released version on the WSJ using a relational evaluation scheme, and describe how the new release allows users to enhance performance using (in-domain) lexical information.

Natural Language Engineering | 2001

Applied morphological processing of English

Guido Minnen; John A. Carroll; Darren Pearce

We describe two newly developed computational tools for morphological processing: a program for analysis of English inflectional morphology, and a morphological generator, automatically derived from the analyser. The tools are fast, being based on finite-state techniques, have wide coverage, incorporating data from various corpora and machine readable dictionaries, and are robust, in that they are able to deal effectively with unknown words. The tools are freely available. We evaluate the accuracy and speed of both tools and discuss a number of practical applications in which they have been put to use.

conference on applied natural language processing | 1997

Automatic Extraction of Subcategorization from Corpora

Ted Briscoe; John A. Carroll

We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount1.

meeting of the association for computational linguistics | 2004

Finding Predominant Word Senses in Untagged Text

Diana McCarthy; Rob Koeling; Julie Weeds; John A. Carroll

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of hand-tagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL-2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domain-specific corpora.

meeting of the association for computational linguistics | 2003

Detecting a Continuum of Compositionality in Phrasal Verbs

Diana McCarthy; Bill Keller; John A. Carroll

We investigate the use of an automatically acquired thesaurus for measures designed to indicate the compositionality of candidate multiword verbs, specifically English phrasal verbs identified automatically using a robust parser. We examine various measures using the nearest neighbours of the phrasal verb, and in some cases the neighbours of the simplex counterpart and show that some of these correlate significantly with human rankings of compositionality on the test set. We also show that whilst the compositionality judgements correlate with some statistics commonly used for extracting multiwords, the relationship is not as strong as that using the automatically constructed thesaurus.

Computational Linguistics | 2003

Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences

Diana McCarthy; John A. Carroll

Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjectivenoun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance.

IEEE Transactions on Knowledge and Data Engineering | 2013

Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus

Danushka Bollegala; David J. Weir; John A. Carroll

Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. Typically, sentiment classification has been modeled as the problem of training a binary classifier using reviews annotated for positive or negative sentiment. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance because words that occur in the train (source) domain might not appear in the test (target) domain. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products. We conduct an extensive empirical analysis of the proposed method on single- and multisource domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus. Moreover, our comparisons against the SentiWordNet, a lexical resource for word polarity, show that the created sentiment-sensitive thesaurus accurately captures words that express similar sentiments.

international conference on computational linguistics | 2008

Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text

Taras Zagibalov; John A. Carroll

We describe and evaluate a new method of automatic seed word selection for un-supervised sentiment classification of product reviews in Chinese. The whole method is unsupervised and does not require any annotated training data; it only requires information about commonly occurring negations and adverbials. Unsupervised techniques are promising for this task since they avoid problems of domain-dependency typically associated with supervised methods. The results obtained are close to those of supervised classifiers and sometimes better, up to an F1 of 92%.

Computational Linguistics | 2007

Unsupervised acquisition of predominant word senses

Diana McCarthy; Rob Koeling; Julie Weeds; John A. Carroll

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of star for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.

international conference on natural language generation | 2000

Robust, applied morphological generation

Guido Minnen; John A. Carroll; Darren Pearce

In practical natural language generation systems it is often advantageous to have a separate component that deals purely with morphological processing. We present such a component: a fast and robust morphological generator for English based on finite-state techniques that generates a word form given a specification of the lemma, part-of-speech, and the type of inflection required. We describe how this morphological generator is used in a prototype system for automatic simplification of English newspaper text, and discuss practical morphological and orthographic issues we have encountered in generation of unrestricted text within this application.

Explore More