Julian Brooke
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julian Brooke.
Computational Linguistics | 2011
Maite Taboada; Julian Brooke; Milan Tofiloski; Kimberly D. Voll; Manfred Stede
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the texts opinion towards its main subject matter. We show that SO-CALs performance is consistent across domains and in completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.
annual meeting of the special interest group on discourse and dialogue | 2009
Maite Taboada; Julian Brooke; Manfred Stede
We present a taxonomy and classification system for distinguishing between different types of paragraphs in movie reviews: formal vs. functional paragraphs and, within the latter, between description and comment. The classification is used for sentiment extraction, achieving improvement over a baseline without paragraph classification.
meeting of the association for computational linguistics | 2009
Milan Tofiloski; Julian Brooke; Maite Taboada
We present a syntactic and lexically based discourse segmenter (SLSeg) that is designed to avoid the common problem of over-segmenting text. Segmentation is the first step in a discourse parser, a system that constructs discourse trees from elementary discourse units. We compare SLSeg to a probabilistic segmenter, showing that a conservative approach increases precision at the expense of recall, while retaining a high F-score across both formal and informal texts.
north american chapter of the association for computational linguistics | 2015
Julian Brooke; Adam Hammond; Graeme Hirst
This paper introduces a software tool, GutenTag, which is aimed at giving literary researchers direct access to NLP techniques for the analysis of texts in the Project Gutenberg corpus. We discuss several facets of the tool, including the handling of formatting and structure, the use and expansion of metadata which is used to identify relevant subcorpora of interest, and a general tagging framework which is intended to cover a wide variety of future NLP modules. Our hope that the shared ground created by this tool will help create new kinds of interaction between the computational linguistics and digital humanities communities, to the benefit of both.
conference on information and knowledge management | 2009
Julian Brooke; Matthew Hurst
A qualitative examination of review texts suggests that there are consistent patterns to how topic and polarity are expressed in discourse. These patterns are visible in the text and paragraph structure, topic depth, and polarity flow. In this paper, we employ sentence-level sentiment classifiers and a hand-built tree ontology to investigate whether these patterns can be quantitatively identified in a large corpus of video game reviews. Our results indicate that the beginning and the end of major textual units (e.g. paragraphs) stand out in the flow of texts, showing a concentration of reliable opinion and key topic aspects, and that there are other important regularities in the expression of opinion and topic relevant to their ordering and the discourse markers with which they appear.
meeting of the association for computational linguistics | 2016
Julian Brooke; Adam Hammond; Timothy Baldwin
We present a named entity recognition (NER) system for tagging fiction: LitNER. Relative to more traditional approaches, LitNER has two important properties: (1) it makes no use of handtagged data or gazetteers, instead it bootstraps a model from term clusters; and (2) it leverages multiple instances of the same name in a text. Our experiments show it to substantially outperform off-the-shelf supervised NER systems.
American Speech | 2014
Sali A. Tagliamonte; Julian Brooke
this article presents a synchronic quantitative study of adjectives in the semantic field of strangeness in a large North American city, toronto, the largest urban center in Canada. the analysis is based on nearly 2,000 adjectives, represent- ing 11 different types, as in Shes really weird and Shes odd. the distribution of these adjectives in apparent time provides startling evidence of change. the adjective strange is quickly moving out of favor, and weird has expanded dramatically, usurping all other forms. Neither linguistic nor social factors are implicated in this change, suggesting that lexical replacement is the prevailing mechanism driving the develop- ment. Consideration of the broader context reveals that renewal and recycling of these adjectives is rooted in the history of english and is progressing in parallel at least across british and North American english. the actuation of the shift toward weird may be rooted in developments in literature and mass media, revealing that adjectives are a vibrant area of the grammar that may be used to track cultural influ- ences on linguistic change. in this article, we target a little-studied topic in dialect or variation research, adjectives. this area of grammar is vast in lexical variety, with overlapping meanings and apparently random choices, undoubtedly one of the reasons it has not been the subject of quantitative investigation till now. our first aim is to document how to approach dialect differences, variation, and change in adjectival use systematically. We begin by utilizing a series of computational techniques to explore the data to delimit the investigation. Crucially, we have at our fingertips one of the largest corpora of spoken vernacular North American english, the toronto english Archive (teA). moreover, it is socially stratified and sampled across a wide age range of individuals born from the early to late twentieth century. together, the computational methods and this substantive data set provide key elements for uncovering relevant and timely variation within the adjectives of contem- porary (North American) english. Following sociolinguistic methods, we
north american chapter of the association for computational linguistics | 2015
Julian Brooke; Adam Hammond; David Jacob; Vivian Tsang; Graeme Hirst; Fraser Shein
Though the multiword lexicon has long been of interest in computational linguistics, most relevant work is targeted at only a small portion of it. Our work is motivated by the needs of learners for more comprehensive resources reflecting formulaic language that goes beyond what is likely to be codified in a dictionary. Working from an initial sequential segmentation approach, we present two enhancements: the use of a new measure to promote the identification of lexicalized sequences, and an expansion to include sequences with gaps. We evaluate using a novel method that allows us to calculate an estimate of recall without a reference lexicon, showing that good performance in the second enhancement depends crucially on the first, and that our lexicon conforms much more with human judgment of formulaic language than alternatives.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017
King Chan; Julian Brooke; Timothy Baldwin
This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10%) of the MWE instances in the joint corpus.
recent advances in natural language processing | 2009
Julian Brooke; Milan Tofiloski; Maite Taboada