Liangjie Hong
Lehigh University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liangjie Hong.
knowledge discovery and data mining | 2010
Liangjie Hong; Brian D. Davison
Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, sentiment analysis and others. While many researchers wish to use standard text mining tools to understand messages on Twitter, the restricted length of those messages prevents them from being employed to their full potential. We address the problem of using standard topic models in micro-blogging environments by studying how the models can be trained on the dataset. We propose several schemes to train a standard topic model and compare their quality and effectiveness through a set of carefully designed experiments from both qualitative and quantitative perspectives. We show that by training a topic model on aggregated messages we can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems. We also discuss how the state-of-the-art Author-Topic model fails to model hierarchical relationships between entities in Social Media.
international acm sigir conference on research and development in information retrieval | 2009
Liangjie Hong; Brian D. Davison
Discussion boards and online forums are important platforms for people to share information. Users post questions or problems onto discussion boards and rely on others to provide possible solutions and such question-related content sometimes even dominates the whole discussion board. However, to retrieve this kind of information automatically and effectively is still a non-trivial task. In addition, the existence of other types of information (e.g., announcements, plans, elaborations, etc.) makes it difficult to assume that every thread in a discussion board is about a question. We consider the problems of identifying question-related threads and their potential answers as classification tasks. Experimental results across multiple datasets demonstrate that our method can significantly improve the performance in both question detection and answer finding subtasks. We also do a careful comparison of how different types of features contribute to the final result and show that non-content features play a key role in improving overall performance. Finally, we show that a ranking scheme based on our classification approach can yield much better performance than prior published methods.
knowledge discovery and data mining | 2011
Liangjie Hong; Byron Dom; Siva Gurumurthy; Kostas Tsioutsiouliklis
In recent years social media have become indispensable tools for information dissemination, operating in tandem with traditional media outlets such as newspapers, and it has become critical to understand the interaction between the new and old sources of news. Although social media as well as traditional media have attracted attention from several research communities, most of the prior work has been limited to a single medium. In addition temporal analysis of these sources can provide an understanding of how information spreads and evolves. Modeling temporal dynamics while considering multiple sources is a challenging research problem. In this paper we address the problem of modeling text streams from two news sources - Twitter and Yahoo! News. Our analysis addresses both their individual properties (including temporal dynamics) and their inter-relationships. This work extends standard topic models by allowing each text stream to have both local topics and shared topics. For temporal modeling we associate each topic with a time-dependent function that characterizes its popularity over time. By integrating the two models, we effectively model the temporal dynamics of multiple correlated text streams in a unified framework. We evaluate our model on a large-scale dataset, consisting of text streams from both Twitter and news feeds from Yahoo! News. Besides overcoming the limitations of existing models, we show that our work achieves better perplexity on unseen data and identifies more coherent topics. We also provide analysis of finding real-world events from the topics obtained by our model.
conference on information and knowledge management | 2011
Dawei Yin; Liangjie Hong; Brian D. Davison
With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and predicting new links are important for many tasks such as friend recommendation, community detection, and modeling network growth. We note that the link prediction problem in a hybrid network is different from previously studied networks. Unlike the information networks and traditional online social networks, the structures in a hybrid network are more complicated and informative. We compare most popular and recent methods and principles for link prediction and recommendation. Finally we propose a novel structure-based personalized link prediction model and compare its predictive performance against many fundamental and popular link prediction methods on real-world data from the Twitter microblogging network. Our experiments on both static and dynamic data sets show that our methods noticeably outperform the state-of-the-art.
international acm sigir conference on research and development in information retrieval | 2012
Liangjie Hong; Ron Bekkerman; Joseph Adler; Brian D. Davison
As online social media further integrates deeper into our lives, we spend more time consuming social update streams that come from our online connections. Although social update streams provide a tremendous opportunity for us to access information on-the-fly, we often complain about its relevance. Some of us are flooded with a steady stream of information and simply cannot process it in full. Ranking the incoming content becomes the only solution for the overwhelmed users. For some others, in contrast, the incoming information stream is pretty weak, and they have to actively search for relevant information which is quite tedious. For these users, augmenting their incoming content flow with relevant information from outside their first-degree network would be a viable solution. In that case, the problem of relevance becomes even more prominent. In this paper, we start an open discussion on how to build effective systems for ranking social updates from a unique perspective of LinkedIn -- the largest professional network in the world. More specifically, we address this problem as an intersection of learning to rank, collaborative filtering, and clickthrough modeling, while leveraging ideas from information retrieval and recommender systems. We propose a novel probabilistic latent factor model with regressions on explicit features and compare it with a number of non-trivial baselines. In addition to demonstrating superior performance of our model, we shed some light on the nature of social updates on LinkedIn and how users interact with them, which might be applicable to social update streams in general.
knowledge discovery and data mining | 2011
Liangjie Hong; Dawei Yin; Jian Guo; Brian D. Davison
Text corpora with documents from a range of time epochs are natural and ubiquitous in many fields, such as research papers, newspaper articles and a variety of types of recently emerged social media. People not only would like to know what kind of topics can be found from these data sources but also wish to understand the temporal dynamics of these topics and predict certain properties of terms or documents in the future. Topic models are usually utilized to find latent topics from text collections, and recently have been applied to temporal text corpora. However, most proposed models are general purpose models to which no real tasks are explicitly associated. Therefore, current models may be difficult to apply in real-world applications, such as the problems of tracking trends and predicting popularity of keywords. In this paper, we introduce a real-world task, tracking trends of terms, to which temporal topic models can be applied. Rather than building a general-purpose model, we propose a new type of topic model that incorporates the volume of terms into the temporal dynamics of topics and optimizes estimates of term volumes. In existing models, trends are either latent variables or not considered at all which limits the potential for practical use of trend information. In contrast, we combine state-space models with term volumes with a supervised learning model, enabling us to effectively predict the volume in the future, even without new documents. In addition, it is straightforward to obtain the volume of latent topics as a by-product of our model, demonstrating the superiority of utilizing temporal topic models over traditional time-series tools (e.g., autoregressive models) to tackle this kind of problem. The proposed model can be further extended with arbitrary word-level features which are evolving over time. We present the results of applying the model to two datasets with long time periods and show its effectiveness over non-trivial baselines.
computational science and engineering | 2009
Liangjie Hong; Zaihan Yang; Brian D. Davison
Community-driven Question Answering services are gaining increasing attention with tens of millions of users and hundreds of millions of posts in recent years. Due to its size, there is a need for users to be able to search these large question answer archives and retrieve high quality content. Research work shows that user reputation modeling makes a contribution when incorporated with relevance models. However, the effectiveness of different link analysis approaches and how to embed topical information---as a user may have different expertise in various areas---are still open questions. In this work, we address these two research questions by first reviewing different link analysis schemes---especially discussing the use of PageRank-based methods since they are less commonly utilized in user reputation modeling. We also introduce Topical PageRank analysis for modeling user reputation on different topics. Comparative experimental results on data from Yahoo! Answers show that PageRank-based approaches are more effective than HITS-like schemes and other heuristics, and that topical link analysis can improve performance.
advances in social networks analysis and mining | 2013
Zaihan Yang; Liangjie Hong; Brian D. Davison
We propose a novel probabilistic topic model that jointly models authors, documents, cited authors, and venues simultaneously in one integrated framework, as compared to previous work which embeds fewer components. This model is designed for three typical applications in academic network analysis: the problems of expert ranking, cited author prediction and venue prediction. Experiments based on two real world data sets demonstrate the model to be effective, and it outperforms several state-of-the-art algorithms in all three applications.
international world wide web conferences | 2011
Liangjie Hong; Ovidiu Dan; Brian D. Davison
international world wide web conferences | 2012
Liangjie Hong; Amr Ahmed; Siva Gurumurthy; Alexander J. Smola; Kostas Tsioutsiouliklis