Yiye Ruan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yiye Ruan is active.

Explore More

Publication

Featured researches published by Yiye Ruan.

international world wide web conferences | 2013

Efficient community detection in large networks using content and links

Yiye Ruan; David Fuhry; Srinivasan Parthasarathy

In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that content information can help strengthen the community signal. This enables ones to eliminate the impact of noise (false positives and false negatives), which is particularly prevalent in online social networks and Web-scale information networks. Specifically we introduce a measure of signal strength between two nodes in the network by fusing their link strength with content similarity. Link strength is estimated based on whether the link is likely (with high probability) to reside within a community. Content similarity is estimated through cosine similarity or Jaccard coefficient. We discuss a simple mechanism for fusing content and link similarity. We then present a biased edge sampling procedure which retains edges that are locally relevant for each graph node. The resulting backbone graph can be clustered using standard community discovery algorithms such as Metis and Markov clustering. Through extensive experiments on multiple real-world datasets (Flickr, Wikipedia and CiteSeer) with varying sizes and characteristics, we demonstrate the effectiveness and efficiency of our methods over state-of-the-art learning and mining approaches several of which also attempt to combine link and content analysis for the purposes of community discovery. Specifically we always find a qualitative benefit when combining content with link analysis. Additionally our biased graph sampling approach realizes a quantitative benefit in that it is typically several orders of magnitude faster than competing approaches.

international conference on management of data | 2011

Local graph sparsification for scalable clustering

Venu Satuluri; Srinivasan Parthasarathy; Yiye Ruan

In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.

knowledge discovery and data mining | 2012

A framework for summarizing and analyzing twitter feeds

Xintian Yang; Amol Ghoting; Yiye Ruan; Srinivasan Parthasarathy

The firehose of data generated by users on social networking and microblogging sites such as Facebook and Twitter is enormous. Real-time analytics on such data is challenging with most current efforts largely focusing on the efficient querying and retrieval of data produced recently. In this paper, we present a dynamic pattern driven approach to summarize data produced by Twitter feeds. We develop a novel approach to maintain an in-memory summary while retaining sufficient information to facilitate a range of user-specific and topic-specific temporal analytics. We empirically compare our approach with several state-of-the-art pattern summarization approaches along the axes of storage cost, query accuracy, query flexibility, and efficiency using real data from Twitter. We find that the proposed approach is not only scalable but also outperforms existing approaches by a large margin.

Social Network Data Analytics | 2011

Community Discovery in Social Networks: Applications, Methods and Emerging Trends

Srinivasan Parthasarathy; Yiye Ruan; Venu Satuluri

Data sets originating from many different real world domains can be represented in the form of interaction networks in a very natural, concise and meaningful fashion. This is particularly true in the social context, especially given recent advances in Internet technologies and Web 2.0 applications leading to a diverse range of evolving social networks. Analysis of such networks can result in the discovery of important patterns and potentially shed light on important properties governing the growth of such networks.

conference on online social networks | 2014

Simultaneous detection of communities and roles from large networks

Yiye Ruan; Srinivasan Parthasarathy

Community detection and structural role detection are two distinct but closely-related perspectives in network analytics. In this paper, we propose RC-Joint, a novel algorithm to simultaneously identify community and structural role assignments in a network. Rather than being agnostic to one assignment while inferring the other, RC-Joint employs a principled approach to guide the detection process in a nonparametric fashion and ensures that the two sets of assignments are sufficiently different from each other. Roles and communities generated by RC-Joint are both soft assignments, reflecting the fact that many real-world networks have overlapping community structures and role memberships. By comparing with state-of-the-art methods in community detection and structural role detection, we demonstrate that RC-Joint harvests the best of two worlds and outperforms existing approaches, while still being competitive in efficiency. We also investigate the effect of different initialization schemes, and find that using the results of RC-Joint on a sparse network as the seed often leads to faster convergence and higher quality.

Archive | 2015

Community Discovery: Simple and Scalable Approaches

Yiye Ruan; David Fuhry; Jiongqian Liang; Yu Wang; Srinivasan Parthasarathy

The increasing size and complexity of online social networks have brought distinct challenges to the task of community discovery. A community discovery algorithm needs to be efficient, not taking a prohibitive amount of time to finish. The algorithm should also be scalable, capable of handling large networks containing billions of edges or even more. Furthermore, a community discovery algorithm should be effective in that it produces community assignments of high quality. In this chapter, we present a selection of algorithms that follow simple design principles, and have proven highly effective and efficient according to extensive empirical evaluations. We start by discussing a generic approach of community discovery by combining multilevel graph contraction with core clustering algorithms. Next we describe the usage of network sampling in community discovery, where the goal is to reduce the number of nodes and/or edges while retaining the network’s underlying community structure. Finally, we review research efforts that leverage various parallel and distributed computing paradigms in community discovery, which can facilitate finding communities in tera- and peta-scale networks.

conference on information and knowledge management | 2014

Component Detection in Directed Networks

Yu-Keng Shih; Sungmin Kim; Yiye Ruan; Jinxing Cheng; Abhishek Gattani; Tao Shi; Srinivasan Parthasarathy

Community detection has been one of the fundamental problems in network analysis. Results from community detection (for example, grouping of products by latent category) can also serve as information nuggets to other business applications, such as product recommendation or taxonomy building. Because several real networks are naturally directed, e.g. World Wide Web, some recent studies proposed algorithms for detecting various types of communities in a directed network. However, few of them considered that nodes play two different roles, source and terminal, in a directed network. In this paper, we adopt a novel concept of communities, directional community, and propose a new algorithm based on Markov Clustering to detect directional communities. We then first compare our algorithm, Dual R-MCL, on synthetic networks with two recent algorithms also designed for detecting directional communities. We show that Dual R-MCL can detect directional communities with significantly higher accuracy and 3x to 25x faster than the two other algorithms. Second, we compare a set of directed network community detection algorithms on a one-day Twitter interaction network and demonstrate that Dual R-MCL can generate clusters more correctly matched to hashtags. Finally, we exhibit our algorithms capacity to identify directional communities from product description networks, where nodes are otherwise not directly connected. Results indicate that directional communities exist in real networks, and Dual R-MCL can effectively detect these directional communities. We believe it will enable the discovery of interesting components in a diverse types of networks where existing methods cannot, and it manifests strong application values.

Archive | 2011