Brendan O'Connor | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brendan O'Connor is active.

Explore More

Publication

Featured researches published by Brendan O'Connor.

empirical methods in natural language processing | 2008

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow; Brendan O'Connor; Daniel Jurafsky; Andrew Y. Ng

Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazons Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.

PLOS ONE | 2014

Diffusion of lexical change in social media.

Jacob Eisenstein; Brendan O'Connor; Noah A. Smith; Eric P. Xing

Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitters sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity – especially with regard to race – plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified “netspeak” dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.

empirical methods in natural language processing | 2016

Demographic Dialectal Variation in Social Media: A Case Study of African-American English.

Su Lin Blodgett; Lisa Green; Brendan O'Connor

Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages, and we verify that this language follows well-known AAE linguistic phenomena. In addition, we analyze the quality of existing language identification and dependency parsing tools on AAE-like text, demonstrating that they perform poorly on such text compared to text associated with white speakers. We also provide an ensemble classifier for language identification which eliminates this disparity and release a new corpus of tweets containing AAE-like language.

international conference on computational linguistics | 2014

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing

Sam Thomson; Brendan O'Connor; Jeffrey Flanigan; David Bamman; Jesse Dodge; Swabha Swayamdipta; Nathan Schneider; Chris Dyer; Noah A. Smith

We present an arc-factored statistical model for semantic dependency parsing, as defined by the SemEval 2014 Shared Task 8 on Broad-Coverage Semantic Dependency Parsing. Our entry in the open track placed second in the competition.

Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces | 2014

MiTextExplorer: Linked brushing and mutual information for exploratory text data analysis

Brendan O'Connor

In this paper I describe a preliminary experimental system, MITEXTEXPLORER, for textual linked brushing, which allows an analyst to interactively explore statistical relationships between (1) terms, and (2) document metadata (covariates). An analyst can graphically select documents embedded in a temporal, spatial, or other continuous space, and the tool reports terms with strong statistical associations for the region. The user can then drill down to specific term and term groupings, viewing further associations, and see how terms are used in context. The goal is to rapidly compare language usage across interesting document covariates. I illustrate examples of using the tool on several datasets: geo-located Twitter messages, presidential State of the Union addresses, the ACL Anthology, and the King James Bible.

international conference on weblogs and social media | 2010

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

Brendan O'Connor; Ramnath Balasubramanyan; Bryan R. Routledge; Noah A. Smith

meeting of the association for computational linguistics | 2011

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Kevin Gimpel; Nathan Schneider; Brendan O'Connor; Dipanjan Das; Daniel Mills; Jacob Eisenstein; Michael Heilman; Dani Yogatama; Jeffrey Flanigan; Noah A. Smith

empirical methods in natural language processing | 2010