Is this you? Create Your Porfile

Zhijun Yin

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhijun Yin is active.

Explore More

Publication

Featured researches published by Zhijun Yin.

extending database technology | 2009

RankClus: integrating clustering with ranking for heterogeneous information network analysis

Yizhou Sun; Jiawei Han; Peixiang Zhao; Zhijun Yin; Hong Cheng; Tianyi Wu

As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well. In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.

international world wide web conferences | 2011

Geographical topic discovery and comparison

Zhijun Yin; Liangliang Cao; Jiawei Han; ChengXiang Zhai; Thomas S. Huang

This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPS-associated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.

Sigkdd Explorations | 2010

Survey on social tagging techniques

Manish Gupta; Rui Li; Zhijun Yin; Jiawei Han

Social tagging on online portals has become a trend now. It has emerged as one of the best ways of associating metadata with web objects. With the increase in the kinds of web objects becoming available, collaborative tagging of such objects is also developing along new dimensions. This popularity has led to a vast literature on social tagging. In this survey paper, we would like to summarize different techniques employed to study various aspects of tagging. Broadly, we would discuss about properties of tag streams, tagging models, tag semantics, generating recommendations using tags, visualizations of tags, applications of tags and problems associated with tagging usage. We would discuss topics like why people tag, what influences the choice of tags, how to model the tagging process, kinds of tags, different power laws observed in tagging domain, how tags are created, how to choose the right tags for recommendation, etc. We conclude with thoughts on future work in the area.

knowledge discovery and data mining | 2009

Exploring social tagging graph for web object classification

Zhijun Yin; Rui Li; Qiaozhu Mei; Jiawei Han

This paper studies web object classification problem with the novel exploration of social tags. Automatically classifying web objects into manageable semantic categories has long been a fundamental preprocess for indexing, browsing, searching, and mining these objects. The explosive growth of heterogeneous web objects, especially non-textual objects such as products, pictures, and videos, has made the problem of web classification increasingly challenging. Such objects often suffer from a lack of easy-extractable features with semantic information, interconnections between each other, as well as training examples with category labels. In this paper, we explore the social tagging data to bridge this gap. We cast web object classification problem as an optimization problem on a graph of objects and tags. We then propose an efficient algorithm which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags. Experiment results show that the exploration of social tags effectively boosts web object classification. Our algorithm significantly outperforms the state-of-the-art of general classification methods.

advances in social networks analysis and mining | 2010

A Unified Framework for Link Recommendation Using Random Walks

Zhijun Yin; Manish Gupta; Tim Weninger; Jiawei Han

The phenomenal success of social networking sites, such as Facebook, Twitter and LinkedIn, has revolutionized the way people communicate. This paradigm has attracted the attention of researchers that wish to study the corresponding social and technological problems. Link recommendation is a critical task that not only helps increase the linkage inside the network and also improves the user experience. In an effective link recommendation algorithm it is essential to identify the factors that influence link creation. This paper enumerates several of these intuitive criteria and proposes an approach which satisfies these factors. This approach estimates link relevance by using random walk algorithm on an augmented social graph with both attribute and structure information. The global and local influences of the attributes are leveraged in the framework as well. Other than link recommendation, our framework can also rank the attributes in the network. Experiments on DBLP and IMDB data sets demonstrate that our method outperforms state-of-the-art methods for link recommendation.

international world wide web conferences | 2010

LINKREC: a unified framework for link recommendation with user attributes and graph structure

Zhijun Yin; Manish Gupta; Tim Weninger; Jiawei Han

With the phenomenal success of networking sites (e.g., Facebook, Twitter and LinkedIn), social networks have drawn substantial attention. On online social networking sites, link recommendation is a critical task that not only helps improve user experience but also plays an essential role in network growth. In this paper we propose several link recommendation criteria, based on both user attributes and graph structure. To discover the candidates that satisfy these criteria, link relevance is estimated using a random walk algorithm on an augmented social graph with both attribute and structure information. The global and local influence of the attributes is leveraged in the framework as well. Besides link recommendation, our framework can also rank attributes in a social network. Experiments on DBLP and IMDB data sets demonstrate that our method outperforms state-of-the-art methods based on network structure and node attribute information for link recommendation.

ACM Transactions on Intelligent Systems and Technology | 2012

Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling

Zhijun Yin; Liangliang Cao; Quanquan Gu; Jiawei Han

This article studies the problem of latent community topic analysis in text-associated graphs. With the development of social media, a lot of user-generated content is available with user networks. Along with rich information in networks, user graphs can be extended with text information associated with nodes. Topic modeling is a classic problem in text mining and it is interesting to discover the latent topics in text-associated graphs. Different from traditional topic modeling methods considering links, we incorporate community discovery into topic analysis in text-associated graphs to guarantee the topical coherence in the communities so that users in the same community are closely linked to each other and share common latent topics. We handle topic modeling and community discovery in the same framework. In our model we separate the concepts of community and topic, so one community can correspond to multiple topics and multiple communities can share the same topic. We compare different methods and perform extensive experiments on two real datasets. The results confirm our hypothesis that topics could help understand community structure, while community structure could help model topics.

international conference on management of data | 2008

Sampling cube: a framework for statistical olap over sampling data

Xiaolei Li; Jiawei Han; Zhijun Yin; Jae Gil Lee; Yizhou Sun

Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are gathered from a sample of roughly 5,000 households. To use the results effectively, the samples are further partitioned in a multidimensional space based on multiple attribute values. This naturally leads to the desirability of OLAP (Online Analytical Processing) over sampling data. However, unlike traditional data, sampling data is inherently uncertain, i.e., not representing the full data in the population. Thus, it is desirable to return not only query results but also the confidence intervals indicating the reliability of the results. Moreover, a certain segment in a multidimensional space may contain none or too few samples. This requires some additional analysis to return trustable results. In this paper we propose a Sampling Cube framework, which efficiently calculates confidence intervals for any multidimensional query and uses the OLAP structure to group similar segments to increase sampling size when needed. Further, to handle high dimensional data, a Sampling Cube Shell method is proposed to effectively reduce the storage requirement while still preserving query result quality.

international conference on data mining | 2011

LPTA: A Probabilistic Model for Latent Periodic Topic Analysis

Zhijun Yin; Liangliang Cao; Jiawei Han; ChengXiang Zhai; Thomas S. Huang

This paper studies the problem of latent periodic topic analysis from time stamped documents. The examples of time stamped documents include news articles, sales records, financial reports, TV programs, and more recently, posts from social media websites such as Flickr, Twitter, and Face book. Different from detecting periodic patterns in traditional time series database, we discover the topics of coherent semantics and periodic characteristics where a topic is represented by a distribution of words. We propose a model called LPTA (Latent Periodic Topic Analysis) that exploits the periodicity of the terms as well as term co-occurrences. To show the effectiveness of our model, we collect several representative datasets including Seminar, DBLP and Flickr. The results show that our model can discover the latent periodic topics effectively and leverage the information from both text and time well.

international conference on management of data | 2008

BibNetMiner: mining bibliographic information networks

Yizhou Sun; Tianyi Wu; Zhijun Yin; Hong Cheng; Jiawei Han; Xiaoxin Yin; Peixiang Zhao

Online bibliographic databases, such as DBLP in computer science and PubMed in medical sciences, contain abundant information about research publications in different fields. Each such database forms a gigantic information network (hence called BibNet), connecting in complex ways research papers, authors, conferences/journals, and possibly citation information as well, and provides a fertile land for information network analysis. Our BibNetMiner is designed for sophisticated information network mining on such bibliographic databases. In this demo, we will take the DBLP database as an example, demonstrate several attractive functions of BibNetMiner, including clustering, ranking and profiling of conferences and authors based on the research subfields. A user-friendly, visualization-enhanced interface will be provided to facilitate interactive exploration of a bibliographic database. This project will serve as an example to demonstrate the power of links in information network mining. Since the dataset is large and the network is heterogeneous, such a study will benefit the research on the analysis of massive heterogeneous information networks.

Explore More