Hongyan Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongyan Liu is active.

Explore More

Publication

Featured researches published by Hongyan Liu.

decision support systems | 2006

A new approach to classification based on association rule mining

Guoqing Chen; Hongyan Liu; Lan Yu; Qiang Wei; Xing Zhang

Classification is one of the key issues in the fields of decision sciences and knowledge discovery. This paper presents a new approach for constructing a classifier, based on an extended association rule mining technique in the context of classification. The characteristic of this approach is threefold: first, applying the information gain measure to the generation of candidate itemsets; second, integrating the process of frequent itemsets generation with the process of rule generation; third, incorporating strategies for avoiding rule redundancy and conflicts into the mining process. The corresponding mining algorithm proposed, namely GARC (Gain based Association Rule Classification), produces a classifier with satisfactory classification accuracy, compared with other classifiers (e.g., C4.5, CBA, SVM, NN). Moreover, in terms of association rule based classification, GARC could filter out many candidate itemsets in the generation process, resulting in a much smaller set of rules than that of CBA.

Knowledge and Information Systems | 2011

Methods for mining frequent items in data streams: an overview

Hongyan Liu; Yuan Lin; Jiawei Han

In many real-world applications, information such as web click data, stock ticker data, sensor network data, phone call records, and traffic monitoring data appear in the form of data streams. Online monitoring of data streams has emerged as an important research undertaking. Estimating the frequency of the items on these streams is an important aggregation and summary technique for both stream mining and data management systems with a broad range of applications. This paper reviews the state-of-the-art progress on methods of identifying frequent items from data streams. It describes different kinds of models for frequent items mining task. For general models such as cash register and Turnstile, we classify existing algorithms into sampling-based, counting-based, and hashing-based categories. The processing techniques and data synopsis structure of each algorithm are described and compared by evaluation measures. Accordingly, as an extension of the general data stream model, four more specific models including time-sensitive model, distributed model, hierarchical and multi-dimensional model, and skewed data model are introduced. The characteristics and limitations of the algorithms of each model are presented, and open issues waiting for study and improvement are discussed.

Electronic Commerce Research and Applications | 2013

Combining user preferences and user opinions for accurate recommendation

Hongyan Liu; Jun He; Tingting Wang; Wenting Song; Xiaoyang Du

Recommendation systems represent a popular research area with a variety of applications. Such systems provide personalized services to the user and help address the problem of information overload. Traditional recommendation methods such as collaborative filtering suffer from low accuracy because of data sparseness though. We propose a novel recommendation algorithm based on analysis of an online review. The algorithm incorporates two new methods for opinion mining and recommendation. As opposed to traditional methods, which are usually based on the similarity of ratings to infer user preferences, the proposed recommendation method analyzes the difference between the ratings and opinions of the user to identify the users preferences. This method considers explicit ratings and implicit opinions, an action that can address the problem of data sparseness. We propose a new feature and opinion extraction method based on the characteristics of online reviews to extract effectively the opinion of the user from a customer review written in Chinese. Based on these methods, we also conduct an empirical study of online restaurant customer reviews to create a restaurant recommendation system and demonstrate the effectiveness of the proposed methods.

international conference on data engineering | 2006

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

Dong Xin; Zheng Shao; Jiawei Han; Hongyan Liu

It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.

knowledge discovery and data mining | 2010

Mining closed episodes from event sequences efficiently

Wenzhi Zhou; Hongyan Liu; Hong Cheng

Recent studies have proposed different methods for mining frequent episodes. In this work, we study the problem of mining closed episodes based on minimal occurrences. We study the properties of minimal occurrences and design effective pruning techniques to prune non-closed episodes. An efficient mining algorithm Clo_episode is proposed to mine all closed episodes following a breadth-first search order and integrating the pruning techniques. Experimental results demonstrate the efficiency of our mining algorithm and the compactness of the mining result set.

Information Sciences | 2009

Top-down mining of frequent closed patterns from very high dimensional data

Hongyan Liu; Xiaoyu Wang; Jun He; Jiawei Han; Dong Xin; Zheng Shao

Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.

asia-pacific web conference | 2013

Detecting Event Rumors on Sina Weibo Automatically

Shengyun Sun; Hongyan Liu; Jun He; Xiaoyong Du

Sina Weibo has become one of the most popular social networks in China. In the meantime, it also becomes a good place to spread various spams. Unlike previous studies on detecting spams such as ads, pornographic messages and phishing, we focus on identifying event rumors (rumors about social events), which are more harmful than other kinds of spams especially in China. To detect event rumors from enormous posts, we studied the characteristics of event rumors and extracted features which can distinguish rumors from ordinary posts. The experiments conducted on real dataset show that the new features are effective to improve the rumor classifier. Further analysis of the event rumors reveals that they can be classified into 4 different types. We propose an approach for detecting one major type, text-picture unmatched event rumors. The experiment demonstrates that this approach is well-performed.

knowledge discovery and data mining | 2009

Exploiting the Block Structure of Link Graph for Efficient Similarity Computation

Pei Li; Yuanzhe Cai; Hongyan Liu; Jun He; Xiaoyong Du

In many real-world domains, link graph is one of the most effective ways to model the relationships between objects. Measuring the similarity of objects in a link graph is studied by many researchers, but an effective and efficient method is still expected. Based on our observation of link graphs from real domains, we find the block structure naturally exists. We propose an algorithm called BlockSimRank , which partitions the link graph into blocks, and obtains similarity of each node-pair in the graph efficiently. Our method is based on random walk on two-layer model, with time complexity as low as O (n 4/3) and less memory need. Experiments show that the accuracy of BlockSimRank is acceptable when the time cost is the lowest.

ACM Transactions on Information Systems | 2010

Mining near-duplicate graph for cluster-based reranking of web video search results

Zi Huang; Bo Hu; Hong Cheng; Heng Tao Shen; Hongyan Liu; Xiaofang Zhou

Recently, video search reranking has been an effective mechanism to improve the initial text-based ranking list by incorporating visual consistency among the result videos. While existing methods attempt to rerank all the individual result videos, they suffer from several drawbacks. In this article, we propose a new video reranking paradigm called cluster-based video reranking (CVR). The idea is to first construct a video near-duplicate graph representing the visual similarity relationship among videos, followed by identifying the near-duplicate clusters from the video near-duplicate graph, then ranking the obtained near-duplicate clusters based on cluster properties and intercluster links, and finally for each ranked cluster, a representative video is selected and returned. Compared to existing methods, the new CVR ranks clusters and exhibits several advantages, including superior reranking by utilizing more reliable cluster properties, fast reranking on a small number of clusters, diverse and representative results. Particularly, we formulate the near-duplicate cluster identification as a novel maximally cohesive subgraph mining problem. By leveraging the designed cluster scoring properties indicating the clusters importance and quality, random walk is applied over the near-duplicate cluster graph to rank clusters. An extensive evaluation study proves the novelty and superiority of our proposals over existing methods.

advanced data mining and applications | 2011

TagClus : a random walk-based method for tag clustering

Jianwei Cui; Hongyan Liu; Jun He; Pei Li; Xiaoyong Du; Puwei Wang

Tagging behavior on the Internet has seen dramatic increase in recent years, and social tagging has become a popular way to organize and share resources. However, ambiguity and large quantities of tags restrict its effective use for resource searching and classifying. Tag clustering can group tags with similar semantics together, thus helping alleviate these problems. In this paper, we introduce a random walk-based method to measure relevance between tags by exploiting the relationship between tags and resources. Based on this, we also develop a novel clustering method, TagClus, which can address several challenges in tag clustering. Experimental results on a real dataset show that our methods achieve good accuracy and acceptable performance for tag clustering.

Explore More