William L. Hamilton
McGill University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William L. Hamilton.
meeting of the association for computational linguistics | 2016
William L. Hamilton; Jure Leskovec; Daniel Jurafsky
Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. We then use this methodology to reveal statistical laws of semantic evolution. Using six historical corpora spanning four languages and two centuries, we propose two quantitative laws of semantic change: (i) the law of conformity---the rate of semantic change scales with an inverse power-law of word frequency; (ii) the law of innovation---independent of frequency, words that are more polysemous have higher rates of semantic change.
Proceedings of the National Academy of Sciences of the United States of America | 2017
Rob Voigt; Nicholas P. Camp; Vinodkumar Prabhakaran; William L. Hamilton; Rebecca C. Hetey; Camilla M. Griffiths; David Jurgens; Daniel Jurafsky; Jennifer L. Eberhardt
Significance Police officers speak significantly less respectfully to black than to white community members in everyday traffic stops, even after controlling for officer race, infraction severity, stop location, and stop outcome. This paper presents a systematic analysis of officer body-worn camera footage, using computational linguistic techniques to automatically measure the respect level that officers display to community members. This work demonstrates that body camera footage can be used as a rich source of data rather than merely archival evidence, and paves the way for developing powerful language-based tools for studying and potentially improving police–community relations. Using footage from body-worn cameras, we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops. We develop computational linguistic methods that extract levels of respect automatically from transcripts, informed by a thin-slicing study of participant ratings of officer utterances. We find that officers speak with consistently less respect toward black versus white community members, even after controlling for the race of the officer, the severity of the infraction, the location of the stop, and the outcome of the stop. Such disparities in common, everyday interactions between police and the communities they serve have important implications for procedural justice and the building of police–community trust.
empirical methods in natural language processing | 2016
William L. Hamilton; Kevin Clark; Jure Leskovec; Daniel Jurafsky
A words sentiment depends on the domain in which it is used. Computational social science research thus requires sentiment lexicons that are specific to the domains being studied. We combine domain-specific word embeddings with a label propagation framework to induce accurate domain-specific sentiment lexicons using small sets of seed words. We show that our approach achieves state-of-the-art performance on inducing sentiment lexicons from domain-specific corpora and that our purely corpus-based approach outperforms methods that rely on hand-curated resources (e.g., WordNet). Using our framework, we induce and release historical sentiment lexicons for 150 years of English and community-specific sentiment lexicons for 250 online communities from the social media forum Reddit. The historical lexicons we induce show that more than 5% of sentiment-bearing (non-neutral) English words completely switched polarity during the last 150 years, and the community-specific lexicons highlight how sentiment varies drastically between different communities.
meeting of the association for computational linguistics | 2016
Vinodkumar Prabhakaran; William L. Hamilton; Daniel A. McFarland; Daniel Jurafsky
Computationally modeling the evolution of science by tracking how scientific topics rise and fall over time has important implications for research funding and public policy. However, little is known about the mechanisms underlying topic growth and decline. We investigate the role of rhetorical framing: whether the rhetorical role or function that authors ascribe to topics (as methods, as goals, as results, etc.) relates to the historical trajectory of the topics. We train topic models and a rhetorical function classifier to map topic models onto their rhetorical roles in 2.4 million abstracts from the Web of Science from 1991-2010. We find that a topic’s rhetorical function is highly predictive of its eventual growth or decline. For example, topics that are rhetorically described as results tend to be in decline, while topics that function as methods tend to be in early phases of growth.
international world wide web conferences | 2018
Srijan Kumar; William L. Hamilton; Jure Leskovec; Daniel Jurafsky
Users organize themselves into communities on web platforms. These communities can interact with one another, often leading to conflicts and toxic interactions. However, little is known about the mechanisms of interactions between communities and how they impact users. Here we study intercommunity interactions across 36,000 communities on Reddit, examining cases where users of one community are mobilized by negative sentiment to comment in another community. We show that such conflicts tend to be initiated by a handful of communities---less than 1% of communities start 74% of conflicts. While conflicts tend to be initiated by highly active community members, they are carried out by significantly less active members. We find that conflicts are marked by formation of echo chambers, where users primarily talk to other users from their own community. In the long-term, conflicts have adverse effects and reduce the overall activity of users in the targeted communities. Our analysis of user interactions also suggests strategies for mitigating the negative impact of conflicts---such as increasing direct engagement between attackers and defenders. Further, we accurately predict whether a conflict will occur by creating a novel LSTM model that combines graph embeddings, user, community, and text features. This model can be used to create an early-warning system for community moderators to prevent conflicts. Altogether, this work presents a data-driven view of community interactions and conflict, and paves the way towards healthier online communities.
computational social science | 2016
Alex Wang; William L. Hamilton; Jure Leskovec
Understanding the ways in which users interact with different online communities is crucial to social network analysis and community maintenance. We present an unsupervised neural model to learn linguistic descriptors for a user’s behavior over time within an online community. We show that the descriptors learned by our model capture the functional roles that users occupy in communities, in contrast to those learned via a standard topic-modeling algorithm, which simply reflect topical content. Experiments on the social media forum Reddit show how the model can provide interpretable insights into user behavior. Our model uncovers linguistic differences that correlate with user activity levels and community clustering.
knowledge discovery and data mining | 2018
Rex Ying; Ruining He; Kaifeng Chen; Pong Eksombatchai; William L. Hamilton; Jure Leskovec
Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains an unsolved challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. Overall, we can train on and embed graphs that are four orders of magnitude larger than typical GCN implementations. We show how GCN embeddings can be used to make high-quality recommendations in various settings at Pinterest, which has a massive underlying graph with 3 billion nodes representing pins and boards, and 17 billion edges. According to offline metrics, user studies, as well as A/B tests, our approach generates higher-quality recommendations than comparable deep learning based systems. To our knowledge, this is by far the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
neural information processing systems | 2017
William L. Hamilton; Zhitao Ying; Jure Leskovec
IEEE Data(base) Engineering Bulletin | 2017
William L. Hamilton; Rex Ying; Jure Leskovec
international conference on machine learning | 2013
William L. Hamilton; Mahdi Milani Fard; Joelle Pineau