Lars Backstrom | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lars Backstrom is active.

Explore More

Publication

Featured researches published by Lars Backstrom.

knowledge discovery and data mining | 2006

Group formation in large social networks: membership, growth, and evolution

Lars Backstrom; Daniel P. Huttenlocher; Jon M. Kleinberg; Xiangyang Lan

The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences - political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, on-line groups are becoming increasingly prominent due to the growth of community and social networking sites such as MySpace and LiveJournal. However, the challenge of collecting and analyzing large-scale time-resolved data on social groups and communities has left most basic questions about the evolution of such groups largely unresolved: what are the structural features that influence whether individuals will join communities, which communities will grow rapidly, and how do the overlaps among pairs of communities change over time.Here we address these questions using two large sources of data: friendship links and community membership on LiveJournal, and co-authorship and conference publications in DBLP. Both of these datasets provide explicit user-defined communities, where conferences serve as proxies for communities in DBLP. We study how the evolution of these communities relates to properties such as the structure of the underlying social networks. We find that the propensity of individuals to join communities, and of communities to grow rapidly, depends in subtle ways on the underlying network structure. For example, the tendency of an individual to join a community is influenced not just by the number of friends he or she has within the community, but also crucially by how those friends are connected to one another. We use decision-tree techniques to identify the most significant structural determinants of these properties. We also develop a novel methodology for measuring movement of individuals between communities, and show how such movements are closely aligned with changes in the topics of interest within the communities.

international world wide web conferences | 2009

Mapping the world's photos

David J. Crandall; Lars Backstrom; Daniel P. Huttenlocher; Jon M. Kleinberg

We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analysis based on text tags and image data with structural analysis based on geospatial data. We use the spatial distribution of where people take photos to define a relational structure between the photos that are taken at popular places. We then study the interplay between this structure and the content, using classification methods for predicting such locations from visual, textual and temporal features of the photos. We find that visual and temporal features improve the ability to estimate the location of a photo, compared to using just textual features. We illustrate using these techniques to organize a large photo collection, while also revealing various interesting properties about popular cities and landmarks at a global scale.

international world wide web conferences | 2007

Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography

Lars Backstrom; Cynthia Dwork; Jon M. Kleinberg

In a social network, nodes correspond topeople or other social entities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes.

web search and data mining | 2011

Supervised random walks: predicting and recommending links in social networks

Lars Backstrom; Jure Leskovec

Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.

international world wide web conferences | 2010

Find me if you can: improving geographical prediction with social and spatial proximity

Lars Backstrom; Eric Sun; Cameron Marlow

Geography and social relationships are inextricably intertwined; the people we interact with on a daily basis almost always live near us. As people spend more time online, data regarding these two dimensions -- geography and social relationships -- are becoming increasingly precise, allowing us to build reliable models to describe their interaction. These models have important implications in the design of location-based services, security intrusion detection, and social media supporting local communities. Using user-supplied address data and the network of associations between members of the Facebook social network, we can directly observe and measure the relationship between geography and friendship. Using these measurements, we introduce an algorithm that predicts the location of an individual from a sparse set of located users with performance that exceeds IP-based geolocation. This algorithm is efficient and scalable, and could be run on a network containing hundreds of millions of users.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Inferring social ties from geographic coincidences

David J. Crandall; Lars Backstrom; Dan Cosley; Siddharth Suri; Daniel P. Huttenlocher; Jon M. Kleinberg

We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals’ physical locations over time.

Proceedings of the National Academy of Sciences of the United States of America | 2012

Structural diversity in social contagion.

Johan Ugander; Lars Backstrom; Cameron Marlow; Jon M. Kleinberg

The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her “contact neighborhood”—the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process. Here we study this question by analyzing the growth of Facebook, a rare example of a social process with genuinely global adoption. We find that the probability of contagion is tightly controlled by the number of connected components in an individuals contact neighborhood, rather than by the actual size of the neighborhood. Surprisingly, once this “structural diversity” is controlled for, the size of the contact neighborhood is in fact generally a negative predictor of contagion. More broadly, our analysis shows how data at the size and resolution of the Facebook network make possible the identification of subtle structural signals that go undetected at smaller scales yet hold pivotal predictive roles for the outcomes of social processes.

web search and data mining | 2008

Preferential behavior in online groups

Lars Backstrom; Ravi Kumar; Cameron Marlow; Jasmine Novak; Andrew Tomkins

Online communities in the form of message boards, listservs, and newsgroups continue to represent a considerable amount of the social activity on the Internet. Every year thousands of groups ourish while others decline into relative obscurity; likewise, millions of members join a new community every year, some of whom will come to manage or moderate the conversation while others simply sit by the sidelines and observe. These processes of group formation, growth, and dissolution are central in social science, and in an online venue they have ramifications for the design and development of community software In this paper we explore a large corpus of thriving online communities. These groups vary widely in size, moderation and privacy, and cover an equally diverse set of subject matter. We present a broad range of descriptive statistics of these groups. Using metadata from groups, members, and individual messages, we identify users who post and are replied-to frequently by multiple group members; we classify these high-engagement users based on the longevity of their engagements. We show that users who will go on to become long-lived, highly-engaged users experience significantly better treatment than other users from the moment they join the group, well before there is an opportunity for them to develop a long-standing relationship with members of the group We present a simple model explaining long-term heavy engagement as a combination of user-dependent and group-dependent factors. Using this model as an analytical tool, we show that properties of the user alone are sufficient to explain 95% of all memberships, but introducing a small amount of per-group information dramatically improves our ability to model users belonging to multiple groups.

web search and data mining | 2013

Characterizing and curating conversation threads: expansion, focus, volume, re-entry

Lars Backstrom; Jon M. Kleinberg; Lillian Lee; Cristian Danescu-Niculescu-Mizil

Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations --- the development of automated methods to select a subset of discussions to present to a user. Here we consider two key sub-problems inherent in conversational curation: length prediction --- predicting the number of comments a discussion thread will receive --- and the novel task of re-entry prediction --- predicting whether a user who has participated in a thread will later contribute another comment to it. The first of these sub-problems arises in estimating how interesting a thread is, in the sense of generating a lot of conversation; the second can help determine whether users should be kept notified of the progress of a thread to which they have already contributed. We develop and evaluate a range of approaches for these tasks, based on an analysis of the network structure and arrival pattern among the participants, as well as a novel dichotomy in the structure of long threads. We find that for both tasks, learning-based approaches using these sources of information.

international world wide web conferences | 2011

Network bucket testing

Lars Backstrom; Jon M. Kleinberg

Bucket testing, also known as A/B testing, is a practice that is widely used by on-line sites with large audiences: in a simple version of the methodology, one evaluates a new feature on the site by exposing it to a very small fraction of the total user population and measuring its effect on this exposed group. For traditional uses of this technique, uniform independent sampling of the population is often enough to produce an exposed group that can serve as a statistical proxy for the full population. In on-line social network applications, however, one often wishes to perform a more complex test: evaluating a new social feature that will only produce an effect if a user and some number of his or her friends are exposed to it. In this case, independent uniform draws from the population will be unlikely to produce groups that contains users together with their friends, and so the construction of the sample must take the network structure into account. This leads quickly to challenging combinatorial problems, since there is an inherent tension between producing enough correlation to select users and their friends, but also enough uniformity and independence that the selected group is a reasonable sample of the full population. Here we develop an algorithmic framework for bucket testing in a network that addresses these challenges. First we describe a novel walk-based sampling method for producing samples of nodes that are internally well-connected but also approximately uniform over the population. Then we show how a collection of multiple independent subgraphs constructed this way can yield reasonable samples for testing. We demonstrate the effectiveness of our algorithms through computational experiments on large portions of the Facebook network.

Explore More