Junjie Yao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junjie Yao is active.

Explore More

Publication

Featured researches published by Junjie Yao.

international conference on data engineering | 2013

A unified model for stable and temporal topic detection from social media data

Hongzhi Yin; Bin Cui; Hua Lu; Yuxin Huang; Junjie Yao

Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this models performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.

international conference on management of data | 2014

Parallel subgraph listing in a large-scale graph

Yingxia Shao; Bin Cui; Lei Chen; Lin Ma; Junjie Yao; Ning Xu

Subgraph listing is a fundamental operation to many graph and network analyses. The problem itself is computationally expensive and is well-studied in centralized processing algorithms. However, the centralized solutions cannot scale well to large graphs. Recently, several parallel approaches are introduced to handle the large graphs. Unfortunately, these parallel approaches still rely on the expensive join operations, thus cannot achieve high performance. In this paper, we design a novel parallel subgraph listing framework, named PSgL. The PSgL iteratively enumerates subgraph instances and solves the subgraph listing in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, we propose several solutions to balance the workload and reduce the size of intermediate results. Specially, we prove the problem of partial subgraph instance distribution for workload balance is NP-hard, and carefully design a set of heuristic strategies. To further reduce the enormous intermediate results, we introduce three independent mechanisms, which are automorphism breaking of the pattern graph, initial pattern vertex selection based on a cost model, and a pruning method based on a light-weight index. We have implemented the prototype of PSgL, and run comprehensive experiments of various graph listing operations on diverse large graphs. The experiments clearly demonstrate that PSgL is robust and can achieve performance gain over the state-of-the-art solutions up to 90%.

World Wide Web | 2012

Bursty event detection from collaborative tags

Junjie Yao; Bin Cui; Yuxin Huang; Yanhong Zhou

Collaborative tagging have emerged as a ubiquitous way to annotate and organize online resources. As a kind of descriptive keyword, large amount of tags are created and associated to multiple types of resources, e.g., web pages, photos, videos and tweets. Users’ tagging actions over time reflect their changing interests. Monitoring and analyzing the temporal patterns of tags can provide important insights to trace hot topics on the web. Existing work focuses on deriving temporal patterns for individual tags. However, there exist remarkable correlations among tags assigned to online resources. In this paper, we propose a new approach to detect bursty tagging event, which captures the relations among a group of correlated tags where the tags are either bursty or associated with bursty tag co-occurrence. This kind of bursty tagging event generally corresponds to a real life event. It profiles the events with more representative and comprehensible clues. The proposed approach is divided into three stages. We exploit the sliding time intervals to extract bursty features as the first step, and then adopt graph clustering techniques to group bursty features into meaningful bursty events. We discuss the choice of similarity and granularity for event detection. After that, we further utilize an automatically generated tag taxonomy to organize bursty events to facilitate the burst oriented navigation and analysis. The experimental study on a large real data set demonstrates the superiority of our new approach.

international conference on management of data | 2015

Community Level Diffusion Extraction

Zhiting Hu; Junjie Yao; Bin Cui; Eric P. Xing

How does online content propagate on social networks? Billions of users generate, consume, and spread tons of information every day. This unprecedented scale of dynamics becomes invaluable to reflect our zeitgeist. However, most present diffusion extraction works have only touched individual user level and cannot obtain comprehensive clues. This paper introduces a new approach, i.e., COmmunity Level Diffusion (COLD), to uncover and explore temporal diffusion. We model topics and communities in a unified latent framework, and extract inter-community influence dynamics. With a well-designed multi-component model structure and a parallel inference implementation on GraphLab, the COLD method is expressive while remaining efficient. The extracted community level patterns enable diffusion exploration from a new perspective. We leverage the compact yet robust representations to develop new prediction and analysis applications. Extensive experiments on large social datasets show significant improvement in prediction accuracy. We can also find communities play very different roles in diffusion processes depending on their interest. Our method guarantees high scalability with increasing data size.

web information systems engineering | 2010

Evolutionary taxonomy construction from dynamic tag space

Bin Cui; Junjie Yao; Gao Cong; Yuxin Huang

Collaborative tagging allows users to tag online resources. We refer to the large database of tags and their relationships as a tag space. In a tag space, the popularity and correlation amongst tags capture the current social interests, and taxonomy is a useful way to organize these tags. As tags change over time, it is imperative to incorporate the temporal tag evolution into the taxonomies. In this paper, we formalize the problem of evolutionary taxonomy generation over a large database of tags. The proposed evolutionary taxonomy framework consists of two key features. Firstly, we develop a novel context-aware edge selection algorithm for taxonomy extraction. Secondly, we propose several algorithms for evolutionary taxonomy fusion. We conduct an extensive performance study using a very large real-life dataset (i.e., Del.ici.ous). The empirical results clearly show that our approach is effective and efficient.

web information systems engineering | 2013

A Multiple Feature Integration Model to Infer Occupation from Social Media Records

Xiang Wang; Lele Yu; Junjie Yao; Bin Cui

With the rapid development of more and more social media applications, lots of users are connected with friends and their daily life and opinions are recorded. Social media provides us an unprecedented way to collect and analyze billions of users’ information. Proper user attribute identification or profile inference becomes more and more attractive and feasible. However, the flourishing social records also pose great challenge in effective feature selection and integration for user profile inference. This is mainly caused by the text sparsity and complex community structures.

conference on information and knowledge management | 2009

Constructing evolutionary taxonomy of collaborative tagging systems

Junjie Yao; Yuxin Huang; Bin Cui

Collaborative tagging systems allow users to label online resources. The tags are generally correlated and evolving according to the change of web contents, and the popularity of tags represent evolution of social interests. Tag taxonomy is a promising solution to organize the data in tagging systems. In this demonstration, we propose to construct the evolutionary taxonomy which incorporates the correlation and evolution of tags, as user generated tags grow and change temporally. We demonstrate that our approach is intuitive and efficient in tag organization which exploits the evolving characteristic of collaborative tagging systems.

database systems for advanced applications | 2011

Modeling user expertise in folksonomies by fusing multi-type features

Junjie Yao; Bin Cui; Qiaosha Han; Ce Zhang; Yanhong Zhou

The folksonomy refers to the online collaborative tagging system which offers a new open platform for content annotation with uncontrolled vocabulary. As folksonomies are gaining in popularity, the expert search and spammer detection in folksonomies attract more and more attention. However, most of previous work are limited on some folksonomy features. In this paper, we introduce a generic and flexible user expertise model for expert search and spammer detection. We first investigate a comprehensive set of expertise evidences related to users, objects and tags in folksonomies. Then we discuss the rich interactions between them and propose a unified Continuous CRF model to integrate these features and interactions. This models applications for expert recommendation and spammer detection are also exploited. Extensive experiments are conducted on a real tagging dataset and demonstrate the models advantages over previous methods, both in performance and coverage.

international conference on management of data | 2012

Temporal provenance discovery in micro-blog message streams (abstract only)

Zijun Xue; Junjie Yao; Bin Cui

Recent years have witnessed the flourishing increases of micro-blog message applications. Prominent examples include Twitter, Facebooks status, and Sina Weibo in China. Messages in these applications are short (140 characters in a message) and easy to create. The subscription and re-sharing features also make it fairly intuitive to propagate. Micro-blog applications provide abundant information to present world scale user interests and social pulse in an unexpected way. But the precious corpus also brings out the noise and fast changing fragments to prohibit effective understanding and management. In this work, we propose a micro-blog provenance model to capture temporal connections within micro-blog messages. Here, provenance refers to data origin identification and transformation logging, demonstrating of great value in recent database and workflow systems. The provenance model is used to represent the message development trail and changes explicitly. We select various types of connections in micro-blog applications to identify the provenance. To cope with the real time micro-message deluge, we discuss a novel message grouping approach to encode and maintain the provenance information. A summary index structure is utilized to enable efficient provenance updating. We collect in-coming messages and compare them with an in-memory index to associate them with related ones. The closely related messages form some virtual provenance representation in a coarse granularity. We periodically dump memory values onto disks. In the actual implementation, we also introduce several adaptive pruning strategies to extend the potential of provenance discovery efficiency. We use the temporal decaying and granularity levels to filter out low chance messages. In the demonstration, we reveal the usefulness of provenance information for rich query retrieval and dynamic message tracking for effective message organization. The real-time collection approach shows advantages over some baselines. Experiments conducted on a real dataset verify the effectiveness and efficiency of our provenance approach. Results show that the partial-indexing strategy and other restriction ones can maintenance the accuracy at 90% and returning rate at 60% with a reasonable low memory usage. This is the first work towards provenance-based indexing support for micro-blog platforms.

international conference on data engineering | 2009