Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sharad Goel is active.

Publication


Featured researches published by Sharad Goel.


Proceedings of the National Academy of Sciences of the United States of America | 2010

Predicting consumer behavior with Web search

Sharad Goel; Jake M. Hofman; Sébastien Lahaie; David M. Pennock; Duncan J. Watts

Recent work has demonstrated that Web search volume can “predict the present,” meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.


Proceedings of the National Academy of Sciences of the United States of America | 2010

Assessing respondent-driven sampling.

Sharad Goel; Matthew J. Salganik

Respondent-driven sampling (RDS) is a network-based technique for estimating traits in hard-to-reach populations, for example, the prevalence of HIV among drug injectors. In recent years RDS has been used in more than 120 studies in more than 20 countries and by leading public health organizations, including the Centers for Disease Control and Prevention in the United States. Despite the widespread use and growing popularity of RDS, there has been little empirical validation of the methodology. Here we investigate the performance of RDS by simulating sampling from 85 known, network populations. Across a variety of traits we find that RDS is substantially less accurate than generally acknowledged and that reported RDS confidence intervals are misleadingly narrow. Moreover, because we model a best-case scenario in which the theoretical RDS sampling assumptions hold exactly, it is unlikely that RDS performs any better in practice than in our simulations. Notably, the poor performance of RDS is driven not by the bias but by the high variance of estimates, a possibility that had been largely overlooked in the RDS literature. Given the consistency of our results across networks and our generous sampling conditions, we conclude that RDS as currently practiced may not be suitable for key aspects of public health surveillance where it is now extensively applied.


Statistics in Medicine | 2009

Respondent‐driven sampling as Markov chain Monte Carlo

Sharad Goel; Matthew J. Salganik

Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present RDS as Markov chain Monte Carlo importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which the sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating RDS studies.


Management Science | 2015

The Structural Virality of Online Diffusion

Sharad Goel; Ashton Anderson; Jake M. Hofman; Duncan J. Watts

Viral products and ideas are intuitively understood to grow through a person-to-person diffusion process analogous to the spread of an infectious disease; however, until recently it has been prohibitively difficult to directly observe purportedly viral events, and thus to rigorously quantify or characterize their structural properties. Here we propose a formal measure of what we label “structural virality” that interpolates between two conceptual extremes: content that gains its popularity through a single, large broadcast and that which grows through multiple generations with any one individual directly responsible for only a fraction of the total adoption. We use this notion of structural virality to analyze a unique data set of a billion diffusion events on Twitter, including the propagation of news stories, videos, images, and petitions. We find that across all domains and all sizes of events, online diffusion is characterized by surprising structural diversity; that is, popular events regularly grow via both broadcast and viral mechanisms, as well as essentially all conceivable combinations of the two. Nevertheless, we find that structural virality is typically low, and remains so independent of size, suggesting that popularity is largely driven by the size of the largest broadcast. Finally, we attempt to replicate these findings with a model of contagion characterized by a low infection rate spreading on a scale-free network. We find that although several of our empirical findings are consistent with such a model, it fails to replicate the observed diversity of structural virality, thereby suggesting new directions for future modeling efforts. This paper was accepted by Lorin Hitt, information systems.


Sexually Transmitted Infections | 2012

Respondent driven sampling--where we are and where should we be going?

Richard G. White; Amy Lansky; Sharad Goel; David Wilson; Wolfgang Hladik; Avi Hakim; Simon Dw Frost

Respondent Driven Sampling (RDS) is a novel variant of link tracing sampling that has primarily been used to estimate the characteristics of hard-to-reach groups, such as the HIV prevalence of drug users.1 ‘Seeds’ are selected by convenience from a population of interest (target population) and given coupons. Seeds then use these coupons to recruit other people, who themselves become recruiters. Recruits are given compensation, usually money, for taking part in the survey and also an incentive for recruiting others. This process continues in recruitment ‘waves’ until the survey is stopped. Estimation methods are then applied to account for the biased recruitment, for example, the presumed over-recruitment of people with more acquaintances, in an attempt to generate estimates for the underlying population. RDS has quickly become popular and relied on by major public health organisations, including the US Centers for Disease Control and Prevention and Family Health International, chiefly because it is often found to be an efficient method of recruitment in hard-to-reach groups, but also because of the availability of custom written software incorporating inference methods that are designed to generate estimates that are representative of the wider population of interest, despite the biased sampling. As demonstrated by RDSs popularity,1 there was a clear need for new methods of data collection on hard-to-reach groups. However, RDS has not been without its critics. Its reliance on the target population for recruitment introduced ethicalw1 and sampling concerns.w2 If RDS estimates are overly biased or the variance is unacceptably high, then RDS will be little more than another method of convenience sampling. If these errors can be minimised however, then RDS has the potential to become a very useful survey methodology. In this editorial we highlight that ‘RDS’ includes both data collection and statistical inference methods, discuss the limitations …


The Annals of Applied Statistics | 2008

Horseshoes in multidimensional scaling and local kernel methods

Persi Diaconis; Sharad Goel; Susan Holmes

Classical multidimensional scaling (MDS) is a method for visualizing high-dimensional point clouds by mapping to low-dimensional Euclidean space. This mapping is defined in terms of eigenfunctions of a matrix of interpoint dissimilarities. In this paper we analyze in detail multidimensional scaling applied to a specific dataset: the 2005 United States House of Representatives roll call votes. Certain MDS and kernel projections output “horseshoes” that are characteristic of dimensionality reduction techniques. We show that, in general, a latent ordering of the data gives rise to these patterns when one only has local information. That is, when only the interpoint distances for nearby points are known accurately. Our results provide a rigorous set of results and insight into manifold learning in the special case where the manifold is a curve. 1. Introduction. Classical multidimensional scaling is a widely used technique for dimensionality reduction in complex data sets, a central problem in pattern recognition and machine learning. In this paper we carefully analyze the output of MDS applied to the 2005 United States House of Representatives roll call votes [Office of the Clerk—U.S. House of Representatives (2005)]. The results we find seem stable over recent years. The resultant 3-dimensional mapping of legislators shows “horseshoes” that are characteristic of a number of dimensionality reduction techniques, including principal components analysis and correspondence analysis. These patterns are heuristically attributed to a latent ordering of the data, for example, the ranking of politicians within a left-right spectrum. Our work lends insight into this heuristic, and we present a rigorous analysis of the “horseshoe phenomenon.”


electronic commerce | 2010

Prediction without markets

Sharad Goel; Daniel M. Reeves; Duncan J. Watts; David M. Pennock

Citing recent successes in forecasting elections, movies, products, and other outcomes, prediction market advocates call for widespread use of market-based methods for government and corporate decision making. Though theoretical and empirical evidence suggests that markets do often outperform alternative mechanisms, less attention has been paid to the magnitude of improvement. Here we compare the performance of prediction markets to conventional methods of prediction, namely polls and statistical models. Examining thousands of sporting and movie events, we find that the relative advantage of prediction markets is surprisingly small, as measured by squared error, calibration, and discrimination. Moreover, these domains also exhibit remarkably steep diminishing returns to information, with nearly all the predictive power captured by only two or three parameters. As policy makers consider adoption of prediction markets, costs should be weighed against potentially modest benefits.


Marketing Science | 2014

Predicting Individual Behavior with Social Networks

Sharad Goel; Daniel G. Goldstein

With the availability of social network data, it has become possible to relate the behavior of individuals to that of their acquaintances on a large scale. Although the similarity of connected individuals is well established, it is unclear whether behavioral predictions based on social data are more accurate than those arising from current marketing practices. We employ a communications network of over 100 million people to forecast highly diverse behaviors, from patronizing an off-line department store to responding to advertising to joining a recreational league. Across all domains, we find that social data are informative in identifying individuals who are most likely to undertake various actions, and moreover, such data improve on both demographic and behavioral models. There are, however, limits to the utility of social data.In particular, when rich transactional data were available, social data did little to improve prediction.


knowledge discovery and data mining | 2017

Algorithmic Decision Making and the Cost of Fairness

Sam Corbett-Davies; Emma Pierson; Avi Feller; Sharad Goel; Aziz Z. Huq

Algorithms are now regularly used to decide whether defendants awaiting trial are too dangerous to be released back into the community. In some cases, black defendants are substantially more likely than white defendants to be incorrectly classified as high risk. To mitigate such disparities, several techniques have recently been proposed to achieve algorithmic fairness. Here we reformulate algorithmic fairness as constrained optimization: the objective is to maximize public safety while satisfying formal fairness constraints designed to reduce racial disparities. We show that for several past definitions of fairness, the optimal algorithms that result require detaining defendants above race-specific risk thresholds. We further show that the optimal unconstrained algorithm requires applying a single, uniform threshold to all defendants. The unconstrained algorithm thus maximizes public safety while also satisfying one important understanding of equality: that all individuals are held to the same standard, irrespective of race. Because the optimal constrained and unconstrained algorithms generally differ, there is tension between improving public safety and satisfying prevailing notions of algorithmic fairness. By examining data from Broward County, Florida, we show that this trade-off can be large in practice. We focus on algorithms for pretrial release decisions, but the principles we discuss apply to other domains, and also to human decision makers carrying out structured decision rules.


The Annals of Applied Statistics | 2016

Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy

Sharad Goel; Justin M. Rao; Ravi Shroff

Recent studies have examined racial disparities in stop-and-frisk, a widely employed but controversial policing tactic. The statistical evidence, however, has been limited and contradictory. We investigate by analyzing three million stops in New York City over five years, focusing on cases where officers suspected the stopped individual of criminal possession of a weapon (CPW). For each CPW stop, we estimate the ex ante probability that the detained suspect has a weapon. We find that in more than 40% of cases, the likelihood of finding a weapon (typically a knife) was less than 1%, raising concerns that the legal requirement of “reasonable suspicion�? was often not met. We further find that blacks and Hispanics were disproportionately stopped in these low hit rate contexts, a phenomenon that we trace to two factors: (1) lower thresholds for stopping individuals — regardless of race — in high-crime, predominately minority areas, particularly public housing; and (2) lower thresholds for stopping minorities relative to similarly situated whites. Finally, we demonstrate that by conducting only the 6% of stops that are statistically most likely to result in weapons seizure, one can both recover the majority of weapons and mitigate racial disparities in who is stopped. We show that this statistically informed stopping strategy can be approximated by simple, easily implemented heuristics with little loss in efficiency.

Collaboration


Dive into the Sharad Goel's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge