Yu Zong
Anhui University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu Zong.
Archive | 2013
Guandong Xu; Yu Zong; Zhenglu Yang
Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.
Expert Systems With Applications | 2015
Xin Li; Guandong Xu; Enhong Chen; Yu Zong
We explore auxiliary resource-time stamps of ratings for POI recommendation.We consider partial order between ratings rather than their numeric values.We novelly model user behaviors by incorporating comparative choice.We devise a stochastic gradient descent algorithm via collection-wise learning.Experiments on two real datasets show our method outperform other method. With the prevalence of GPS-enabled smart phones, Location Based Social Network (LBSN) has emerged and become a hot research topic during the past few years. As one of the most important components in LBSN, Points-of-Interests (POIs) has been extensively studied by both academia and industry, yielding POI recommendations to enhance user experience in exploring the city. In conventional methods, rating vectors for both users and POIs are utilized for similarity calculation, which might yield inaccuracy due to the differences of user biases. In our opinion, the rating values themselves do not give exact preferences of users, however the numeric order of ratings given by a user within a certain period provides a hint of preference order of POIs by such user. Firstly, we propose an approach to model users preference by employing utility theory. Secondly, We devise a collection-wise learning method over partial orders through an effective stochastic gradient descent algorithm. We test our model on two real world datasets, i.e., Yelp and TripAdvisor, by comparing with some state-of-the-art approaches including PMF and several user preference modeling methods. In terms of MAP and Recall, we averagely achieve 15% improvement with regard to the baseline methods. The results show the significance of comparative choice in a certain time window and show its superiority to the existing methods.
international conference on knowledge based and intelligent information and engineering systems | 2010
Guandong Xu; Yu Zong; Peter Dolog; Yanchun Zhang
Web clustering is an approach for aggregating Web objects into various groups according to underlying relationships among them. Finding co-clusters of Web objects is an interesting topic in the context of Web usage mining, which is able to capture the underlying user navigational interest and content preference simultaneously. In this paper we will present an algorithm using bipartite spectral clustering to cocluster Web users and pages. The usage data of users visiting Web sites is modeled as a bipartite graph and the spectral clustering is then applied to the graph representation of usage data. The proposed approach is evaluated by experiments performed on real datasets, and the impact of using various clustering algorithms is also investigated. Experimental results have demonstrated the employed method can effectively reveal the subset aggregates of Web users and pages which are closely related.
international conference on knowledge based and intelligent information and engineering systems | 2011
Guandong Xu; Yu Zong; Rong Pan; Peter Dolog; Ping Jin
In social annotation systems, users label digital resources by using tags which are freely chosen textual descriptors. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major challenge of most social annotation systems resulting from the severe problems of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful approach to handle these problems in the social annotation systems. In this paper, we propose a novel clustering algorithm named kernel information propagation for tag clustering. This approach makes use of the kernel density estimation of the KNN neighbor directed graph as a start to reveal the prestige rank of tags in tagging data. The random walk with restart algorithm is then employed to determine the center points of tag clusters. The main strength of the proposed approach is the capability of partitioning tags from the perspective of tag prestige rank rather than the intuitive similarity calculation itself. Experimental studies on three real world datasets demonstrate the effectiveness and superiority of the proposed method.
Journal of Computers | 2013
Yu Zong; Ping Jin; Guandong Xu; Rong Pan
Clustering as an important unsupervised learning technique is widely used to discover the inherent structure of a given data set. For clustering is depended on applications, researchers use different models to defined clustering problems. Heuristic clustering algorithm is an efficient way to deal with clustering problem defined by combining optimization model, but initialization sensitivity is an inevitable problem. In the past decades, a lot of methods have been proposed to deal with such problem. In this paper, on the contrary, we take the advantage of the initialization sensitivity to design a new clustering algorithm. We, firstly, run K-means, a widely used heuristic clustering algorithm, on data set for multiple times to generate several clustering results; secondly, propose a structure named Local Accumulative Knowledge ( LAKE ) to capture the common information of clustering results; thirdly, execute the Single-linkage algorithm on LAKE to generate a rough clustering result; eventually, assign the rest data objects to the corresponding clusters. Experimental results on synthetic and real world data sets demonstrate the superiority of the proposed approach in terms of clustering quality measures.
intelligent information systems | 2015
Guandong Xu; Yu Zong; Ping Jin; Rong Pan; Zongda Wu
In the social annotation systems, users annotate digital data sources by using tags which are freely chosen textual descriptions. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major challenge of most social annotation systems resulting from several problems of ambiguity, redundancy and less semantic nature of tags. Clustering is a useful tool to handle these problems in social annotation systems. In this paper, we propose a novel tag clustering algorithm based on kernel information propagation. This approach makes use of the kernel density estimation of the kNN neighborhood directed graph as a start to reveal the prestige rank of tags in tagging data. The random walk with restart algorithm is then employed to determine the center points of tag clusters. The main strength of the proposed approach is the capability of partitioning tags from the perspective of tag prestige rank rather than the intuitive similarity calculation itself. Experimental studies on the six real world data sets demonstrate the effectiveness and superiority of the proposed method against other state-of-the-art clustering approaches in terms of various evaluation metrics.
advanced data mining and applications | 2011
Yu Zong; Guandong Xu; Ping Jin; Yanchun Zhang; Enhong Chen; Rong Pan
In social annotation systems, users label digital resources by using tags which are freely chosen textual descriptions. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major problem of most social tagging systems resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical Agglomerative Clustering on tagging data, which possess the inherent drawbacks, such as the sensitivity of initialization. In this paper, we instead make use of the approximate backbone of tag clustering results to find out better tag clusters. In particular, we propose an APProximate backbonE-based Clustering algorithm for Tags (APPECT).The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z =C 1,C 2,...,C m ; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone as the initial tag clustering result and then assign the rest tags into the corresponding clusters based on the similarity. Experimental results on three real world datasets namely MedWorm, MovieLens and Dmoz demonstrate the effectiveness and the superiority of the proposed method against the traditional approaches.
international symposium on neural networks | 2012
Yu Zong; Guandong Xu; Ping Jin; Xun Yi; Enhong Chen; Zongda Wu
High dimensional clustering is often encountered in real application and projective clustering is an effective way to deal with high dimensional clustering problems aiming to capture the dense areas embedded in subsets of attributes/subspaces. Most projective clustering algorithms use equal or varying width hyper-rectangle structure to identify the dense areas and their locations. Therefore, it is a crucial task to decide the widths of these hyper-rectangle structures in projective clustering. Naturally, making use of the real data distribution directly to determine the widths of the dense structures is a promising and feasible approach. In this paper, we propose a projective clustering algorithm based on hyper-rectangle structure, whose width is estimated from the kernel distribution of real data. In particular, we first define a structure called Significant Local Dense Area (SLDA) structure by using an efficient kernel density estimator, Rodeo; and then design a greedy search method to find the whole SLDAs covered the data distribution in the high-dimensional space; eventually, we run a single-linkage clustering algorithm on the SLDAs to form the final clusters and identify the outliers. The main strength of the proposed algorithm is validated by the experiments on synthetic and real world data sets.
Information Processing Letters | 2011
Yu Zong; Guandong Xu; Ping Jin; Yanchun Zhang; Enhong Chen
Clustering is an important research area with numerous applications in pattern recognition, machine learning, and data mining. Since the clustering problem on numeric data sets can be formulated as a typical combinatorial optimization problem, many researches have addressed the design of heuristic algorithms for finding sub-optimal solutions in a reasonable period of time. However, most of the heuristic clustering algorithms suffer from the problem of being sensitive to the initialization and do not guarantee the high quality results. Recently, Approximate Backbone (AB), i.e., the commonly shared intersection of several sub-optimal solutions, has been proposed to address the sensitivity problem of initialization. In this paper, we aim to introduce the AB into heuristic clustering to overcome the initialization sensitivity of conventional heuristic clustering algorithms. The main advantage of the proposed method is the capability of restricting the initial search space around the optimal result by defining the AB, and in turn, reducing the impact of initialization on clustering, eventually improving the performance of heuristic clustering. Experiments on synthetic and real world data sets are performed to validate the effectiveness of the proposed approach in comparison to three conventional heuristic clustering algorithms and three other algorithms with improvement on initialization.
knowledge discovery and data mining | 2008
Yu Zong; XianChao Zhang; He Jiang; Mingchu Li
Due to inherent sparse, noise and nearly zero difference characteristics of high dimensional data sets, traditional clustering methods fails to detect meaningful clusters in them. Subspace clustering attempts to find the true distribution inherent to the subsets with original attributes. However, which subspace contains the true clustering result is usually uncertain. From this point of view, subspace clustering can be regarded as an uncertain discursion problem. In this paper, we firstly develop the criterion to evaluate creditable subspaces which contain the meaningful clustering results, and then propose a creditable subspace labeling method (CSL) based on D-S evidence theory. The creditable subspaces of the original data space can be found by iteratively executing the algorithm CSL. Once the creditable subspaces are got, the true clustering results can be found using a traditional clustering algorithm on each creditable subspace. Experiments show that CSL can detect the actual creditable subspace with the original attribute. In this way, a novel approach of clustering problems using traditional clustering algorithms to deal with high dimension data sets is proposed.