Zi Yang
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zi Yang.
knowledge discovery and data mining | 2009
Jie Tang; Jimeng Sun; Chi Wang; Zi Yang
In large social networks, nodes (users, entities) are influenced by others for various reasons. For example, the colleagues have strong influence on ones work, while the friends have strong influence on ones daily life. How to differentiate the social influences from different angles(topics)? How to quantify the strength of those social influences? How to estimate the model on real large networks? To address these fundamental questions, we propose Topical Affinity Propagation (TAP) to model the topic-level social influence on large networks. In particular, TAP can take results of any topic modeling and the existing network structure to perform topic-level influence propagation. With the help of the influence analysis, we present several important applications on real data sets such as 1) what are the representative nodes on a given topic? 2) how to identify the social influences of neighboring nodes on a particular node? To scale to real large networks, TAP is designed with efficient distributed learning algorithms that is implemented and tested under the Map-Reduce framework. We further present the common characteristics of distributed learning algorithms for Map-Reduce. Finally, we demonstrate the effectiveness and efficiency of TAP on real large data sets.
international acm sigir conference on research and development in information retrieval | 2011
Zi Yang; Keke Cai; Jie Tang; Li Zhang; Zhong Su; Juanzi Li
We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard documents. With the rapid growth of online social networks, abundant user generated content (e.g., comments) associated with the standard documents is available. Which parts in a document are social users really caring about? How can we generate summaries for standard documents by considering both the informativeness of sentences and interests of social users? This paper explores such an approach by modeling Web documents and social contexts into a unified framework. We propose a dual wing factor graph (DWFG) model, which utilizes the mutual reinforcement between Web documents and their associated social contexts to generate summaries. An efficient algorithm is designed to learn the proposed factor graph model.Experimental results on a Twitter data set validate the effectiveness of the proposed model. By leveraging the social context information, our approach obtains significant improvement (averagely +5.0%-17.3%) over several alternative methods (CRF, SVM, LR, PR, and DocLead) on the performance of summarization.
Machine Learning | 2011
Jie Tang; Jing Zhang; Ruoming Jin; Zi Yang; Keke Cai; Li Zhang; Zhong Su
In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of expertise search at topic level over heterogeneous networks. In particular, we study this problem in an academic search and mining system, which extracts and integrates the academic data from the distributed Web. We present a unified topic model to simultaneously model topical aspects of different objects in the academic network. Based on the learned topic models, we investigate the expertise search problem from three dimensions: ranking, citation tracing analysis, and topical graph search. Specifically, we propose a topic level random walk method for ranking the different objects. In citation tracing analysis, we aim to uncover how a piece of work influences its follow-up work. Finally, we have developed a topical graph search function, based on the topic modeling and citation tracing analysis. Experimental results show that various expertise search and mining tasks can indeed benefit from the proposed topic level analysis approach.
web age information management | 2008
Jing Zhang; Jie Tang; Bangyong Liang; Zi Yang; Sijie Wang; Jingjing Zuo; Juanzi Li
With the Web content having been changed from homogeneity to heterogeneity, the recommendation becomes a more challenging issue. In this paper, we have investigated the recommendation problem on a general heterogeneous Web social network. We categorize the recommendation needs on it into two main scenarios: recommendation when a person is doing a search and recommendation when the person is browsing the information. We formalize the recommendation as a ranking problem over the heterogeneous network. Moreover, we propose using a random walk model to simultaneously ranking different types of objects and propose a pair-wise learning algorithm to learn the weight of each type of relationship in the model. Experimental results on two real-world data sets show that improvements can be obtained by comparing with the baseline methods.
web-age information management | 2009
Zi Yang; Jie Tang; Jing Zhang; Juanzi Li; Bo Gao
In this paper, we study the problem of topic-level random walk, which concerns the random walk at the topic level. Previously, several related works such as topic sensitive page rank have been conducted. However, topics in these methods were predefined, which makes the methods inapplicable to different domains. In this paper, we propose a four-step approach for topic-level random walk. We employ a probabilistic topic model to automatically extract topics from documents. Then we perform the random walk at the topic level. We also propose an approach to model topics of the query and then combine the random walk ranking score with the relevance score based on the modeling results. Experimental results on a real-world data set show that our proposed approach can significantly outperform the baseline methods of using language model and that of using traditional PageRank.
IEEE Intelligent Systems | 2011
Zi Yang; Jie Tang; Juanzi Li; Wenjun Yang
Formalizing social networks in a factor graph model and using a learning algorithm to estimate the pairwise social influence between nodes can help inform multilevel social community analysis.
Knowledge and Information Systems | 2013
Bo Wang; Jie Tang; Wei Fan; Songcan Chen; Chenhao Tan; Zi Yang
Traditional learning-to-rank problem mainly focuses on one single type of objects. However, with the rapid growth of the Web 2.0, ranking over multiple interrelated and heterogeneous objects becomes a common situation, e.g., the heterogeneous academic network. In this scenario, one may have much training data for some type of objects (e.g. conferences) while only very few for the interested types of objects (e.g. authors). Thus, the two important questions are: (1) Given a networked data set, how could one borrow supervision from other types of objects in order to build an accurate ranking model for the interested objects with insufficient supervision? (2) If there are links between different objects, how can we exploit their relationships for improved ranking performance? In this work, we first propose a regularized framework called HCDRank to simultaneously minimize two loss functions related to these two domains. Then, we extend the approach by exploiting the link information between heterogeneous objects. We conduct a theoretical analysis to the proposed approach and derive its generalization bound to demonstrate how the two related domains could help each other in learning ranking functions. Experimental results on three different genres of data sets demonstrate the effectiveness of the proposed approaches.
international conference on data mining | 2009
Jie Tang; Jing Zhang; Jeffrey Xu Yu; Zi Yang; Keke Cai; Rui Ma; Li Zhang; Zhong Su
It is well known that Web users create links with different intentions. However, a key question, which is not well studied, is how to categorize the links and how to quantify the strength of the influence of a web page on another if there is a link between the two linked web pages. In this paper, we focus on the problem of link semantics analysis, and propose a novel supervised learning approach to build a model, based on a training link-labeled and link-weighted graph where a link-label represents the category of a link and a link-weight represents the influence of one web page on the other in a link. Based on the model built, we categorize links and quantify the influence of web pages on the others in a large graph in the same application domain. We discuss our proposed approach, namely Pairwise Restricted Boltzmann Machines (PRBMs), and conduct extensive experimental studies to demonstrate the effectiveness of our approach using large real datasets.
international conference on data mining | 2010
Zi Yang; Wei Li; Jie Tang; Juanzi Li
In this paper, we consider a novel problem referred to as term filtering with bounded error to reduce the term (feature) space by eliminating terms without (or with bounded) information loss. Different from existing works, the obtained term space provides a complete view of the original term space. More interestingly, several important questions can be answered such as: 1) how different terms interact with each other and 2) how the filtered terms can be represented by the other terms. We perform a theoretical investigation of the term filtering problem and link it to the Geometric Covering By Discs problem, and prove its NP-hardness. We present two novel approaches for both loss less and lossy term filtering with bounds on the introduced error. Experimental results on multiple text mining tasks validate the effectiveness of the proposed approaches.
conference on information and knowledge management | 2010
Zi Yang; Jingyi Guo; Keke Cai; Jie Tang; Juanzi Li; Li Zhang; Zhong Su