Alistair Kennedy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alistair Kennedy is active.

Explore More

Publication

Featured researches published by Alistair Kennedy.

computational intelligence | 2006

SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS

Alistair Kennedy; Diana Inkpen

We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers, and diminishers. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respectively, the degree to which a term is positive or negative. The first method classifies reviews based on the number of positive and negative terms they contain. We use the General Inquirer to identify positive and negative terms, as well as negation terms, intensifiers, and diminishers. We also use positive and negative terms from other sources, including a dictionary of synonym differences and a very large Web corpus. To compute corpus‐based semantic orientation values of terms, we use their association scores with a small group of positive and negative terms. We show that extending the term‐counting method with contextual valence shifters improves the accuracy of the classification. The second method uses a Machine Learning algorithm, Support Vector Machines. We start with unigram features and then add bigrams that consist of a valence shifter and another word. The accuracy of classification is very high, and the valence shifter bigrams slightly improve it. The features that contribute to the high accuracy are the words in the lists of positive and negative terms. Previous work focused on either the term‐counting method or the Machine Learning method. We show that combining the two methods achieves better results than either method alone.

hawaii international conference on system sciences | 2005

Automatic Identification of Home Pages on the Web

Alistair Kennedy; Michael A. Shepherd

The research reported in this paper is the first phase of a larger project on the automatic classification of Web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal home page, corporate home page or organization home page. Results indicate that the classifier is able to distinguish home pages from non-home pages and within the home page genre it is able to distinguish personal from corporate home pages. Organization home pages, however, were more difficult to distinguish from personal and corporate home pages.

canadian conference on artificial intelligence | 2012

Getting emotional about news summarization

Alistair Kennedy; Anna Kazantseva; Diana Inkpen; Stan Szpakowicz

News is not simply a straight re-telling of events, but rather an interpretation of those events by a reporter, whose feelings and opinions can often become part of the story itself. Research on automatic summarization of news articles has thus far focused on facts rather than emotions, but perhaps emotions can be significant in news stories too. This article describes research done at the University of Ottawa to create an emotion-aware summarization system, which participated in the Text Analysis Conference last year. We have established that increasing the number of emotional words could help ranking sentences to be selected for the summary, but there was no overall improvement in the final system. Although this experiment did not improve news summarization as evaluated by a variety of standard scoring techniques, it was successful at generating summaries with more emotional words while maintaining the overall quality of the summary.

text speech and dialogue | 2010

Evaluation of a sentence ranker for text summarization based on Roget's thesaurus

Alistair Kennedy; Stan Szpakowicz

Evaluation is one of the hardest tasks in automatic text summarization. It is perhaps even harder to determine how much a particular component of a summarization system contributes to the success of the whole system. We examine how to evaluate the sentence ranking component using a corpus which has been partially labelled with Summary Content Units. To demonstrate this technique, we apply it to the evaluation of a new sentence-ranking system which uses Rogets Thesaurus. This corpus provides a quick and nearly automatic method of evaluating the quality of sentence ranking.

canadian conference on artificial intelligence | 2011

A supervised method of feature weighting for measuring semantic relatedness

Alistair Kennedy; Stan Szpakowicz

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Rogets Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Rogets Thesaurus automatically, and doing so with high confidence.

text speech and dialogue | 2007

Disambiguating hypernym relations for Roget's thesaurus

Alistair Kennedy; Stanistaw Szpakowicz

Rogets Thesaurus is a lexical resource which groups terms by semantic relatedness. It is Rogets shortcoming that the relations are ambiguous, in that it does not name them; it only shows that there is a relation between terms. Our work focuses on disambiguating hypernym relations within Rogets Thesaurus. Several techniques of identifying hypernym relations are compared and contrasted in this paper, and a total of over 50,000 hypernym relations have been disambiguated within Rogets. Human judges have evaluated the quality of our disambiguation techniques, and we have demonstrated on several applications the usefulness of the disambiguated relations.

text speech and dialogue | 2012

Supervised Distributional Semantic Relatedness

Alistair Kennedy; Stan Szpakowicz

Distributional measures of semantic relatedness determine word similarity based on how frequently a pair of words appear in the same contexts. A typical method is to construct a word-context matrix, then re-weight it using some measure of association, and finally take the vector distance as a measure of similarity. This has largely been an unsupervised process, but in recent years more work has been done devising methods of using known sets of synonyms to enhance relatedness measures. This paper examines and expands on one such measure, which learns a weighting of a word-context matrix by measuring associations between words appearing in a given context and sets of known synonyms. In doing so we propose a general method of learning weights for word-context matrices, and evaluate it on a word similarity task. This method works with a variety of measures of association and can be trained with synonyms from any resource.

Journal of Language Modelling | 2014

Evaluation of Automatic Updates of Roget's Thesaurus

Alistair Kennedy; Stan Szpakowicz

Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, automated methods are required. This work presents a tuneable method of measuring semantic relatedness, trained on Roget’s Thesaurus, which generates lists of terms related to words not yet in the Thesaurus. Using these lists of terms, we experiment with three methods of adding words to the Thesaurus. We add, with high confidence, over 5500 and 9600 new word senses to versions of Roget’s Thesaurus from 1911 and 1987 respectively. We evaluate our work both manually and by applying the updated thesauri in three NLP tasks: selection of the best synonym from a set of candidates, pseudo-word-sense disambiguation and SAT-style analogy problems. We find that the newly added words are of high quality. The additions significantly improve the performance of Roget’s-based methods in these NLP tasks. The performance of our system compares favourably with that of WordNet-based methods. Our methods are general enough to work with different versions of Roget’s Thesaurus.

canadian conference on artificial intelligence | 2010

Automatically expanding the lexicon of Roget's thesaurus

Alistair Kennedy

In recent years much research has been conducted on building Thesauri and enhancing them with new terms and relationships I propose to build and evaluate a system for automatically updating the lexicon of Rogets Thesaurus Rogets has been shown to lend itself well to many Natural Language Processing tasks One of the factors limiting Rogets use is that the only publicly available version of Rogets is from 1911 and is sorely in need of an updated lexicon.

canadian conference on artificial intelligence | 2010

Toward a gold standard for extractive text summarization

Alistair Kennedy; Stan Szpakowicz

Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection The question arises just how good extractive summarization can ever be Without generating language to express the gist of a text – its abstract – can we expect to make summaries which are both readable and informative? In search for an answer, we employed a corpus partially labelled with Summary Content Units: snippets which convey the main ideas in the document collection Starting from this corpus, we created SCU-optimal summaries for extractive summarization We support the claim of optimality with a series of experiments.

Explore More