Jaewon Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaewon Yang is active.

Explore More

Publication

Featured researches published by Jaewon Yang.

web search and data mining | 2011

Patterns of temporal variation in online media

Jaewon Yang; Jure Leskovec

Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored.n We study temporal patterns associated with online content and how the contents popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets.n We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.

Knowledge and Information Systems | 2015

Defining and evaluating network communities based on ground-truth

Jaewon Yang; Jure Leskovec

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100xa0million nodes. The proposed method achieves 30xa0% relative improvement over current local clustering methods.

international conference on data mining | 2010

Modeling Information Diffusion in Implicit Networks

Jaewon Yang; Jure Leskovec

Social media forms a central domain for the production and dissemination of real-time information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions among numerous participants. Here we develop the Linear Influence Model where rather than requiring the knowledge of the social network and then modeling the diffusion by predicting which node will influence which other nodes in the network, we focus on modeling the global influence of a node on the rate of diffusion through the (implicit) network. We model the number of newly infected nodes as a function of which other nodes got infected in the past. For each node we estimate an influence function that quantifies how many subsequent infections can be attributed to the influence of that node over time. A nonparametric formulation of the model leads to a simple least squares problem that can be solved on large datasets. We validate our model on a set of 500 million tweets and a set of 170 million news articles and blog posts. We show that the Linear Influence Model accurately models influences of nodes and reliably predicts the temporal dynamics of information diffusion. We find that patterns of influence of individual participants differ significantly depending on the type of the node and the topic of the information.

web search and data mining | 2013

Overlapping community detection at scale: a nonnegative matrix factorization approach

Jaewon Yang; Jure Leskovec

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem.n In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present community detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot properly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our algorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected communities as well as in speed and scalability of our algorithm.

international conference on data mining | 2013

Community Detection in Networks with Node Attributes

Jaewon Yang; Julian McAuley; Jure Leskovec

Community detection algorithms are fundamental tools that allow us to uncover organizational principles in networks. When detecting communities, there are two possible sources of information one can use: the network structure, and the features and attributes of nodes. Even though communities form around nodes that have common edges and common attributes, typically, algorithms have only focused on one of these two data modalities: community detection algorithms traditionally focus only on the network structure, while clustering algorithms mostly consider only node attributes. In this paper, we develop Communities from Edge Structure and Node Attributes (CESNA), an accurate and scalable algorithm for detecting overlapping communities in networks with node attributes. CESNA statistically models the interaction between the network structure and the node attributes, which leads to more accurate community detection as well as improved robustness in the presence of noise in the network structure. CESNA has a linear runtime in the network size and is able to process networks an order of magnitude larger than comparable approaches. Last, CESNA also helps with the interpretation of detected communities by finding relevant node attributes for each community.

international conference on data mining | 2012

Defining and Evaluating Network Communities Based on Ground-Truth

Jaewon Yang; Jure Leskovec

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.

knowledge discovery and data mining | 2013

Information cartography: creating zoomable, large-scale maps of information

Dafna Shahaf; Jaewon Yang; Caroline Suen; Jeff Jacobs; Heidi Wang; Jure Leskovec

In an era of information overload, many people struggle to make sense of complex stories, such as presidential elections or economic reforms. We propose a methodology for creating structured summaries of information, which we call zoomable metro maps. Just as cartographic maps have been relied upon for centuries to help us understand our surroundings, metro maps can help us understand the information landscape. Given large collection of news documents our proposed algorithm generates a map of connections that explicitly captures story development. As different users might be interested in different levels of granularity, the maps are zoomable, with each level of zoom showing finer details and interactions. In this paper, we formalize characteristics of good zoomable maps and formulate their construction as an optimization problem. We provide efficient, scalable methods with theoretical guarantees for generating maps. Pilot user studies over real-world datasets demonstrate that our method helps users comprehend complex stories better than prior work.

international world wide web conferences | 2014

Finding progression stages in time-evolving event sequences

Jaewon Yang; Julian McAuley; Jure Leskovec; Paea LePendu; Nigam H. Shah

Event sequences, such as patients medical histories or users sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every individual follows the same evolutionary pattern, stages may have very different lengths, and individuals may progress at different rates. In this paper, we develop a model-based method for discovering common progression stages in general event sequences. We develop a generative model in which each sequence belongs to a class, and sequences from a given class pass through a common set of stages, where each sequence evolves at its own rate. We then develop a scalable algorithm to infer classes of sequences, while also segmenting each sequence into a set of stages. We evaluate our method on event sequences, ranging from patients medical histories to online news and navigational traces from the Web. The evaluation shows that our methodology can predict future events in a sequence, while also accurately inferring meaningful progression stages, and effectively grouping sequences based on common progression patterns. More generally, our methodology allows us to reason about how event sequences progress over time, by discovering patterns and categories of temporal evolution in large-scale datasets of events.

knowledge discovery and data mining | 2013

Estimating sharer reputation via social data calibration

Jaewon Yang; Bee-Chung Chen; Deepak Agarwal

Online social networks have become important channels for users to share content with their connections and diffuse information. Although much work has been done to identify socially influential users, the problem of finding reputable sharers, who share good content, has received relatively little attention. Availability of such reputation scores can be useful or various applications like recommending people to follow, procuring high quality content in a scalable way, creating a content reputation economy to incentivize high quality sharing, and many more. To estimate sharer reputation, it is intuitive to leverage data that records how recipients respond (through clicking, liking, etc.) to content items shared by a sharer. However, such data is usually biased --- it has a selection bias since the shared items can only be seen and responded to by users connected to the sharer in most social networks, and it has a response bias since the response is usually influenced by the relationship between the sharer and the recipient (which may not indicate whether the shared content is good). To correct for such biases, we propose to utilize an additional data source that provides unbiased goodness estimates for a small set of shared items, and calibrate biased social data through a novel multi-level hierarchical model that describes how the unbiased data and biased data are jointly generated according to sharer reputation scores. The unbiased data also provides the ground truth for quantitative evaluation of different methods. Experiments based on such ground-truth data show that our proposed model significantly outperforms existing methods that estimate social influence using biased social data.

arXiv: Social and Information Networks | 2012