Dawei Yin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dawei Yin is active.

Explore More

Publication

Featured researches published by Dawei Yin.

conference on information and knowledge management | 2011

Structural link analysis and prediction in microblogs

Dawei Yin; Liangjie Hong; Brian D. Davison

With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and predicting new links are important for many tasks such as friend recommendation, community detection, and modeling network growth. We note that the link prediction problem in a hybrid network is different from previously studied networks. Unlike the information networks and traditional online social networks, the structures in a hybrid network are more complicated and informative. We compare most popular and recent methods and principles for link prediction and recommendation. Finally we propose a novel structure-based personalized link prediction model and compare its predictive performance against many fundamental and popular link prediction methods on real-world data from the Twitter microblogging network. Our experiments on both static and dynamic data sets show that our methods noticeably outperform the state-of-the-art.

knowledge discovery and data mining | 2010

A probabilistic model for personalized tag prediction

Dawei Yin; Zhenzhen Xue; Liangjie Hong; Brian D. Davison

Social tagging systems have become increasingly popular for sharing and organizing web resources. Tag prediction is a common feature of social tagging systems. Social tagging by nature is an incremental process, meaning that once a user has saved a web page with tags, the tagging system can provide more accurate predictions for the user, based on users incremental behaviors. However, existing tag prediction methods do not consider this important factor, in which their training and test datasets are either split by a fixed time stamp or randomly sampled from a larger corpus. In our temporal experiments, we perform a time-sensitive sampling on an existing public dataset, resulting in a new scenario which is much closer to real-world.n In this paper, we address the problem of tag prediction by proposing a probabilistic model for personalized tag prediction. The model is a Bayesian approach, and integrates three factors - ego-centric effect, environmental effects and web page content. Two methods - both intuitive calculation and learning optimization - are provided for parameter estimation. Pure graphbased methods which may have significant constraints (such as every user, every item and every tag has to occur in at least p posts), cannot make a prediction in most of real world cases while our model improves the F-measure by over 30% compared to a leading algorithm, in our real-world use case.

knowledge discovery and data mining | 2011

Tracking trends: incorporating term volume into temporal topic models

Liangjie Hong; Dawei Yin; Jian Guo; Brian D. Davison

Text corpora with documents from a range of time epochs are natural and ubiquitous in many fields, such as research papers, newspaper articles and a variety of types of recently emerged social media. People not only would like to know what kind of topics can be found from these data sources but also wish to understand the temporal dynamics of these topics and predict certain properties of terms or documents in the future. Topic models are usually utilized to find latent topics from text collections, and recently have been applied to temporal text corpora. However, most proposed models are general purpose models to which no real tasks are explicitly associated. Therefore, current models may be difficult to apply in real-world applications, such as the problems of tracking trends and predicting popularity of keywords. In this paper, we introduce a real-world task, tracking trends of terms, to which temporal topic models can be applied. Rather than building a general-purpose model, we propose a new type of topic model that incorporates the volume of terms into the temporal dynamics of topics and optimizes estimates of term volumes. In existing models, trends are either latent variables or not considered at all which limits the potential for practical use of trend information. In contrast, we combine state-space models with term volumes with a supervised learning model, enabling us to effectively predict the volume in the future, even without new documents. In addition, it is straightforward to obtain the volume of latent topics as a by-product of our model, demonstrating the superiority of utilizing temporal topic models over traditional time-series tools (e.g., autoregressive models) to tackle this kind of problem. The proposed model can be further extended with arbitrary word-level features which are evolving over time. We present the results of applying the model to two datasets with long time periods and show its effectiveness over non-trivial baselines.

international acm sigir conference on research and development in information retrieval | 2011

Link formation analysis in microblogs

Dawei Yin; Liangjie Hong; Xiong Xiong; Brian D. Davison

Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are important for many tasks such as friend recommendation, community detection, and network growth models. In this paper, by analyzing data collected over time, we find that 90% of new links are to people just two hops away and dynamics of friend acquisition are also related to users account age. Finally, we compare two popular sampling methods which are widely used for network analysis and find that ForestFire does not preserve properties required for the link prediction task.

web search and data mining | 2013

Connecting comments and tags: improved modeling of social tagging systems

Dawei Yin; Shengbo Guo; Boris Chidlovskii; Brian D. Davison; Cédric Archambeau; Guillaume Bouchard

Collaborative tagging systems are now deployed extensively to help users share and organize resources. Tag prediction and recommendation can simplify and streamline the user experience, and by modeling user preferences, predictive accuracy can be significantly improved. However, previous methods typically model user behavior based only on a log of prior tags, neglecting other behaviors and information in social tagging systems, e.g., commenting on items and connecting with other users. On the other hand, little is known about the connection and correlations among these behaviors and contexts in social tagging systems.n In this paper, we investigate improved modeling for predictive social tagging systems. Our explanatory analyses demonstrate three significant challenges: coupled high order interaction, data sparsity and cold start on items. We tackle these problems by using a generalized latent factor model and fully Bayesian treatment. To evaluate performance, we test on two real-world data sets from Flickr and Bibsonomy. Our experiments on these data sets show that to achieve best predictive performance, it is necessary to employ a fully Bayesian treatment in modeling high order relations in social tagging system. Our methods noticeably outperform state-of-the-art approaches.

international world wide web conferences | 2011

Exploiting session-like behaviors in tag prediction

Dawei Yin; Liangjie Hong; Brian D. Davison

In social bookmarking systems, existing methods in tag prediction have shown that the performance of prediction can be significantly improved by modeling users preferences. However, these preferences are usually treated as constant over time, neglecting the temporal factor within users behaviors. In this paper, we study the problem of session-like behavior in social tagging systems and demonstrate that the predictive performance can be improved by considering sessions. Experiments, conducted on three public datasets, show that our session-based method can outperform baselines and two state-of-the-art algorithms significantly.

web search and data mining | 2014

Exploiting contextual factors for click modeling in sponsored search

Dawei Yin; Shike Mei; Bin Cao; Jian-Tao Sun; Brian D. Davison

Sponsored search is the primary business for todays commercial search engines. Accurate prediction of the Click-Through Rate (CTR) for ads is key to displaying relevant ads to users. In this paper, we systematically study the two kinds of contextual factors influencing the CTR: 1) In micro factors, we focus on the factors for mainline ads, including ad depth, query diversity, ad interaction. 2) In macro factors, we try to understand the correlations of clicks between organic search and sponsored search. Based on this data analysis, we propose novel click models which harvest these new explored factors. To the best of our knowledge, this is the first paper to examine and model the effects of the above contextual factors in sponsored search. Extensive experiments on large-scale real-world datasets show that by incorporating these contextual factors, our novel click models can outperform state-of-the-art methods.

international acm sigir conference on research and development in information retrieval | 2011

Award prediction with temporal citation network analysis

Zaihan Yang; Dawei Yin; Brian D. Davison

Each year many ACM SIG communities will recognize an outstanding researcher through an award in honor of his or her profound impact and numerous research contributions. This work is the first to investigate an automated mechanism to help in selecting future award winners. We approach the problem as a researchers expertise ranking problem, and propose a temporal probabilistic ranking model which combines content with citation network analysis. Experimental results based on real-world citation data and historical awardees indicate that some kinds of SIG awards are well-modeled by this approach.

web search and data mining | 2014

Estimating ad group performance in sponsored search

Dawei Yin; Bin Cao; Jian-Tao Sun; Brian D. Davison

In modern commercial search engines, the pay-per-click (PPC) advertising model is widely used in sponsored search. The search engines try to deliver ads which can produce greater click yields (the total number of clicks for the list of ads per impression). Therefore, predicting user clicks plays a critical role in sponsored search. The current ad-delivery strategy is a two-step approach which first predicts individual ad CTR for the given query and then selects the ads with higher predicted CTR. However, this strategy is naturally suboptimal and correlation between ads is often ignored under this strategy. The learning problem is focused on predicting individual performance rather than group performance which is the more important measurement. In this paper, we study click yield measurement in sponsored search and focus on the problem---predicting group performance (click yields) in sponsored search. To tackle all challenges in this problem---depth effects, interactive influence, cold start and sparseness of ad textual information---we first investigate several effects and propose a novel framework that could directly predict group performance for lists of ads. Our extensive experiments on a large-scale real-world dataset from a commercial search engine show that we achieve significant improvement by solving the sponsored search problem from the new perspective. Our methods noticeably outperform existing state-of-the-art approaches.

advances in social networks analysis and mining | 2014

Recommendation in academia: a joint multi-relational model

Zaihan Yang; Dawei Yin; Brian D. Davison

In this paper, we target at four specific recommendation tasks in the academic environment: the recommendation for author coauthorships, paper citation recommendation for authors, paper citation recommendation for papers, and publishing venue recommendation for author-paper pairs. Different from previous work which tackles each of these tasks separately while neglecting their mutual effect and connection, we propose a joint multi-relational model that can exploit the latent correlation between relations and solve several tasks in a unified way. Moreover, for better ranking purpose, we extend the work maximizing MAP over one single tensor, and make it applicable to maximize MAP over multiple matrices and tensors. Experiments conducted over two real world data sets demonstrate the effectiveness of our model: 1) improved performance can be achieved with joint modeling over multiple relations; 2) our model can outperform three state-of-the art algorithms for several tasks.

Explore More