Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hanna M. Wallach is active.

Publication


Featured researches published by Hanna M. Wallach.


international world wide web conferences | 2017

Auditing Search Engines for Differential Satisfaction Across Demographics

Rishabh Mehrotra; Ashton Anderson; Fernando Diaz; Amit Sharma; Hanna M. Wallach; Emine Yilmaz

Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of naively comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics.


social informatics | 2016

The Social Dynamics of Language Change in Online Networks

Rahul Goel; Sandeep Soni; Naman Goyal; John Paparrizos; Hanna M. Wallach; Fernando Diaz; Jacob Eisenstein

Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence. But, while language change has long been a topic of study in sociolinguistics, traditional linguistic research methods rely on circumstantial evidence, estimating the direction of change from differences between older and younger speakers. In this paper, we use a data set of several million Twitter users to track language changes in progress. First, we show that language change can be viewed as a form of social influence: we observe complex contagion for phonetic spellings and “netspeak” abbreviations (e.g., lol), but not for older dialect markers from spoken language. Next, we test whether specific types of social network connections are more influential than others, using a parametric Hawkes process model. We find that tie strength plays an important role: densely embedded social ties are significantly better conduits of linguistic influence. Geographic locality appears to play a more limited role: we find relatively little evidence to support the hypothesis that individuals are more influenced by geographically local social ties, even in their usage of geographical dialect markers.


computational social science | 2016

Bag of What? Simple Noun Phrase Extraction for Text Analysis.

Abram Handler; Matthew James Denny; Hanna M. Wallach; Brendan O'Connor

Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a partof-speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram BOW. We compare NPFST to both ngram and parsing methods in terms of yield, recall, and efficiency. We then demonstrate how to use NPFST for exploratory analyses; it performs well, without configuration, on many different kinds of English text. Finally, we present a case study using NPFST to analyze a new corpus of U.S. congressional bills.


Child Abuse & Neglect | 2016

Characterization of contact offenders and child exploitation material trafficking on five peer-to-peer networks

George Dean Bissias; Brian Neil Levine; Marc Liberatore; Brian Lynn; Juston Moore; Hanna M. Wallach; Janis Wolak

We provide detailed measurement of the illegal trade in child exploitation material (CEM, also known as child pornography) from mid-2011 through 2014 on five popular peer-to-peer (P2P) file sharing networks. We characterize several observations: counts of peers trafficking in CEM; the proportion of arrested traffickers that were identified during the investigation as committing contact sexual offenses against children; trends in the trafficking of sexual images of sadistic acts and infants or toddlers; the relationship between such content and contact offenders; and survival rates of CEM. In the 5 P2P networks we examined, we estimate there were recently about 840,000 unique installations per month of P2P programs sharing CEM worldwide. We estimate that about 3 in 10,000 Internet users worldwide were sharing CEM in a given month; rates vary per country. We found an overall month-to-month decline in trafficking of CEM during our study. By surveying law enforcement we determined that 9.5% of persons arrested for P2P-based CEM trafficking on the studied networks were identified during the investigation as having sexually offended against children offline. Rates per network varied, ranging from 8% of arrests for CEM trafficking on Gnutella to 21% on BitTorrent. Within BitTorrent, where law enforcement applied their own measure of content severity, the rate of contact offenses among peers sharing the most-severe CEM (29%) was higher than those sharing the least-severe CEM (15%). Although the persistence of CEM on the networks varied, it generally survived for long periods of time; e.g., BitTorrent CEM had a survival rate near 100%.


empirical methods in natural language processing | 2016

Detecting and Characterizing Events

Allison June-Barlow Chaney; Hanna M. Wallach; Matthew Connelly; David M. Blei

Significant events are characterized by interactions between entities (such as countries, organizations, or individuals) that deviate from typical interaction patterns. Analysts, including historians, political scientists, and journalists, commonly read large quantities of text to construct an accurate picture of when and where an event happened, who was involved, and in what ways. In this paper, we present the Capsule model for analyzing documents to detect and characterize events of potential significance. Specifically, we develop a model based on topic modeling that distinguishes between topics that describe “business as usual” and topics that deviate from these patterns. To demonstrate this model, we analyze a corpus of over two million U.S. State Department cables from the 1970s. We provide an open-source implementation of an inference algorithm for the model and a pipeline for exploring its results.


Archive | 2016

Inference on the Effects of Observed Features in Latent Space Models for Networks

Zachary M. Jones; Matthew James Denny; Bruce A. Desmarais; Hanna M. Wallach

The latent space model (LSM) for network data is a generative probabilistic model that combines a generalized linear model with a latent spatial embedding of the network. It has been used to decrease error in the estimation of and inference regarding the effects of observed covariates. In applications of the LSM, it is assumed that the latent spatial embedding can control for unmeasured confounding structure that is related to the values of edges in the network. As far as we know, there has been no research that considers the LSM’s performance in adjusting for unmeasured structure to reduce estimation and inferential errors. We investigate the LSM’s performance via a Monte Carlo study. In the presence of an unmeasured covariate that can be appropriately modeled using a latent space, estimation and inferential error remain high under even moderate confounding. However, the prediction error of the LSM when unmeasured network structure is present is substantially lower in most cases. We conclude that the LSM is most appropriately used for exploratory or predictive tasks.


international conference on artificial intelligence and statistics | 2010

An Alternative Prior Process for Nonparametric Bayesian Clustering

Hanna M. Wallach; Shane T. Jensen; Lee H. Dicker; Katherine A. Heller


knowledge discovery and data mining | 2015

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

Aaron Schein; John Paisley; David M. Blei; Hanna M. Wallach


international conference on artificial intelligence and statistics | 2015

{The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation}

Fangjian Guo; Charles Blundell; Hanna M. Wallach; Katherine A. Heller


neural information processing systems | 2016

Poisson-Gamma dynamical systems

Aaron Schein; Hanna M. Wallach; Mingyuan Zhou

Collaboration


Dive into the Hanna M. Wallach's collaboration.

Top Co-Authors

Avatar

Aaron Schein

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bruce A. Desmarais

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthew James Denny

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Mingyuan Zhou

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge