Is this you? Create Your Porfile

Zhiang Wu

Nanjing University of Finance and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhiang Wu is active.

Explore More

Publication

Featured researches published by Zhiang Wu.

knowledge discovery and data mining | 2012

HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation

Zhiang Wu; Junjie Wu; Jie Cao; Dacheng Tao

Shilling attackers apply biased rating profiles to recommender systems for manipulating online product recommendations. Although many studies have been devoted to shilling attack detection, few of them can handle the hybrid shilling attacks that usually happen in practice, and the studies for real-life applications are rarely seen. Moreover, little attention has yet been paid to modeling both labeled and unlabeled user profiles, although there are often a few labeled but numerous unlabeled users available in practice. This paper presents a Hybrid Shilling Attack Detector, or HySAD for short, to tackle these problems. In particular, HySAD introduces MC-Relief to select effective detection metrics, and Semi-supervised Naive Bayes (SNB_lambda) to precisely separate Random-Filler model attackers and Average-Filler model attackers from normal users. Thorough experiments on MovieLens and Netflix datasets demonstrate the effectiveness of HySAD in detecting hybrid shilling attacks, and its robustness for various obfuscated strategies. A real-life case study on product reviews of Amazon.cn is also provided, which further demonstrates that HySAD can effectively improve the accuracy of a collaborative-filtering based recommender system, and provide interesting opportunities for in-depth analysis of attacker behaviors. These, in turn, justify the value of HySAD for real-world applications.

Knowledge and Information Systems | 2013

Hybrid Collaborative Filtering algorithm for bidirectional Web service recommendation

Jie Cao; Zhiang Wu; Youquan Wang; Yi Zhuang

Web service recommendation has become a hot yet fundamental research topic in service computing. The most popular technique is the Collaborative Filtering (CF) based on a user-item matrix. However, it cannot well capture the relationship between Web services and providers. To address this issue, we first design a cube model to explicitly describe the relationship among providers, consumers and Web services. And then, we present a Standard Deviation based Hybrid Collaborative Filtering (SD-HCF) for Web Service Recommendation (WSRec) and an Inverse consumer Frequency based User Collaborative Filtering (IF-UCF) for Potential Consumers Recommendation (PCRec). Finally, the decision-making process of bidirectional recommendation is provided for both providers and consumers. Sets of experiments are conducted on real-world data provided by Planet-Lab. In the experiment phase, we show how the parameters of SD-HCF impact on the prediction quality as well as demonstrate that the SD-HCF is much better than extant methods on recommendation quality, including the CF based on user, the CF based on item and general HCF. Experimental comparison between IF-UCF and UCF indicates the effectiveness of adding inverse consumer frequency to UCF.

World Wide Web | 2013

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

Jie Cao; Zhiang Wu; Bo Mao; Yanchun Zhang

Collaborative filtering (CF) technique is capable of generating personalized recommendations. However, the recommender systems utilizing CF as their key algorithms are vulnerable to shilling attacks which insert malicious user profiles into the systems to push or nuke the reputations of targeted items. There are only a small number of labeled users in most of the practical recommender systems, while a large number of users are unlabeled because it is expensive to obtain their identities. In this paper, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed to take advantage of both types of data. It first trains a naïve Bayes classifier on a small set of labeled users, and then incorporates unlabeled users with EM-λ to improve the initial naïve Bayes classifier. Experiments on MovieLens datasets are implemented to compare the efficiency of Semi-SAD with supervised learning based detector and unsupervised learning based detector. The results indicate that Semi-SAD can better detect various kinds of shilling attacks than others, especially against obfuscated and hybrid shilling attacks.

knowledge discovery and data mining | 2008

SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering

Jie Cao; Zhiang Wu; Junjie Wu; Hui Xiong

Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which performs K-means clustering with the KL-divergence as the proximity function. While expert efforts on INFO-K-means have shown promising results, a remaining challenge is to deal with high-dimensional sparse data. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional sparse data. This leads to infinite KL-divergence values, which create a dilemma in assigning objects to the centroids during the iteration process of K-means. To meet this dilemma, in this paper, we propose a Summation-based Incremental Learning (SAIL) method for INFO-K-means clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of the KL-divergence by the computation of the Shannon entropy. This can avoid the zero-value dilemma caused by the use of the KL-divergence. Our experimental results on various real-world document data sets have shown that, with SAIL as a booster, the clustering performance of K-means can be significantly improved. Also, SAIL leads to quick convergence and a robust clustering performance on high-dimensional sparse data.

Signal Processing | 2013

Towards information-theoretic K-means clustering for image indexing

Jie Cao; Zhiang Wu; Junjie Wu; Wenjie Liu

Information-theoretic K-means (Info-Kmeans) aims to cluster high-dimensional data, such as images featured by the bag-of-features (BOF) model, using K-means algorithm with KL-divergence as the distance. While research efforts along this line have shown promising results, a remaining challenge is to deal with the high sparsity of image data. Indeed, the centroids may contain many zero-value features that create a dilemma in assigning objects to centroids during the iterative process of Info-Kmeans. To meet this challenge, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering in this paper. Specifically, SAIL can avoid the zero-feature dilemma by replacing the computation of KL-divergence between instances and centroids, by the computation of centroid entropies only. To further improve the clustering quality, we also introduce the Variable Neighborhood Search (VNS) meta-heuristic and propose the V-SAIL algorithm. Experimental results on various benchmark data sets clearly demonstrate the effectiveness of SAIL and V-SAIL. In particular, they help to successfully recognize nine out of 11 landmarks from extremely high-dimensional and sparse image vectors, with the presence of severe noise.

web information systems engineering | 2013

Community Detection in Multi-relational Social Networks

Zhiang Wu; Wenpeng Yin; Jie Cao; Guandong Xu; Alfredo Cuzzocrea

Multi-relational networks are ubiquitous in many fields such as bibliography, twitter, and healthcare. There have been many studies in the literature targeting at discovering communities from social networks. However, most of them have focused on single-relational networks. A hint of methods detected communities from multi-relational networks by converting them to single-relational networks first. Nevertheless, they commonly assumed different relations were independent from each other, which is obviously unreal to real-life cases. In this paper, we attempt to address this challenge by introducing a novel co-ranking framework, named MutuRank. It makes full use of the mutual influence between relations and actors to transform the multi-relational network to the single-relational network. We then present GMM-NK (Gaussian Mixture Model with Neighbor Knowledge) based on local consistency principle to enhance the performance of spectral clustering process in discovering overlapping communities. Experimental results on both synthetic and real-world data demonstrate the effectiveness of the proposed method.

web age information management | 2012

Pick-Up Tree Based Route Recommendation from Taxi Trajectories

Haoran Hu; Zhiang Wu; Bo Mao; Yi Zhuang; Jie Cao; Jingui Pan

Recommending suitable routes to taxi drivers for picking up passengers is helpful to raise their incomes and reduce the gasoline consumption. In this paper, a pick-up tree based route recommender system is proposed to minimize the traveling distance without carrying passengers for a given taxis set. Firstly, we apply clustering approach to the GPS trajectory data of a large number of taxis that indicates state variance from “free” to “occupied”, and take the centroids as potential pick-up points. Secondly, we propose a heuristic based on skyline computation to construct a pick-up tree in which current position is its root node that connects all centroids. Then, we present a probability model to estimate gasoline consumption of every route. By adopting the estimated gasoline consumption as the weight of every route, the weighted Round-Robin recommendation method for the set of taxis is proposed. Our experimental results on real-world taxi trajectories data set have shown that the proposed recommendation method effectively reduce the driving distance before carrying passengers, especially when the number of cabs becomes large. Meanwhile, the time-cost of our method is also lower than the existing methods.

conference on recommender systems | 2011

Semi-SAD: applying semi-supervised learning to shilling attack detection

Zhiang Wu; Jie Cao; Bo Mao; Youquan Wang

Collaborative filtering (CF) based recommender systems are vulnerable to shilling attacks. In some leading e-commerce sites, there exists a large number of unlabeled users, and it is expensive to obtain their identities. Existing research efforts on shilling attack detection fail to exploit these unlabeled users. In this article, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed. Semi-SAD is trained with the labeled and unlabeled user profiles using the combination of naïve Bayes classifier and EM-», augmented Expectation Maximization (EM). Experiments on MovieLens datasets show that our proposed Semi-SAD is efficient and effective.

international conference on data mining | 2015

Spammers Detection from Product Reviews: A Hybrid Model

Zhiang Wu; Youquan Wang; Yaqiong Wang; Junjie Wu; Jie Cao; Lu Zhang

Driven by profits, spam reviews for product promotion or suppression become increasingly rampant in online shopping platforms. This paper focuses on detecting hidden spam users based on product reviews. In the literature, there have been tremendous studies suggesting diversified methods for spammer detection, but whether these methods can be combined effectively for higher performance remains unclear. Along this line, a hybrid PU-learning-based Spammer Detection (hPSD) model is proposed in this paper. On one hand, hPSD can detect multi-type spammers by injecting or recognizing only a small portion of positive samples, which meets particularly real-world application scenarios. More importantly, hPSD can leverage both user features and user relations to build a spammer classifier via a semi-supervised hybrid learning framework. Experimental results on movie data sets with shilling injection show that hPSD outperforms several state-of-the-art baseline methods. In particular, hPSD shows great potential in detecting hidden spammers as well as their underlying employers from a real-life Amazon data set. These demonstrate the effectiveness and practical value of hPSD for real-life applications.

The Computer Journal | 2014

Detecting Genuine Communities from Large-Scale Social Networks: A Pattern-Based Method

Zhiang Wu; Jie Cao; Junjie Wu; Youquan Wang; Chunyang Liu

Community detection is a long-standing yet very difficult task in social network analysis. It becomes more challenging as many online social networking sites are evolving into super-large scales. Numerous methods have been proposed for community detection from massive networks, but how to reconcile the partitioning efficiency and the community quality remains an open problem. In this paper, we attempt to address this challenge by introducing a COSine-pattern-based COMmunity extractionframework:COSCOM.TheCOSCOMadoptsanextractingviewofcommunitydetection. It first extracts the so-called asymptotically equivalent structures (AESs) from networks, from which the nodes are further partitioned into crisp communities using any of the existing methods. Specifically,weprovethatanAESisaverytightgroupofnodes,andisactuallyacosinepatterndefined by the extended cosine similarity.A novel cosine-pattern mining algorithm based on the ordered antimonotone of cosine similarity is thus proposed for the efficient extraction of AESs. Experiments on various real-world social networks demonstrate the advantage of the extracting view of community detection. In particular, COSCOM shows merits in detecting genuine communities by either internal or external validity.

Explore More