Is this you? Create Your Porfile

Zhan Bu

Nanjing University of Finance and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhan Bu is active.

Explore More

Publication

Featured researches published by Zhan Bu.

IEEE Transactions on Knowledge and Data Engineering | 2016

Fast and Accurate Mining the Community Structure: Integrating Center Locating and Membership Optimization

Hui-Jia Li; Zhan Bu; Aihua Li; Zhidong Liu; Yong Shi

Mining communities or clusters in networks is valuable in analyzing, designing, and optimizing many natural and engineering complex systems, e.g., protein networks, power grid, and transportation systems. Most of the existing techniques view the community mining problem as an optimization problem based on a given quality function(e.g., modularity), however none of them are grounded with a systematic theory to identify the central nodes in the network. Moreover, how to reconcile the mining efficiency and the community quality still remains an open problem. In this paper, we attempt to address the above challenges by introducing a novel algorithm. First, a kernel function with a tunable influence factor is proposed to measure the leadership of each node, those nodes with highest local leadership can be viewed as the candidate central nodes. Then, we use a discrete-time dynamical system to describe the dynamical assignment of community membership; and formulate the serval conditions to guarantee the convergence of each nodes dynamic trajectory, by which the hierarchical community structure of the network can be revealed. The proposed dynamical system is independent of the quality function used, so could also be applied in other community mining models. Our algorithm is highly efficient: the computational complexity analysis shows that the execution time is nearly linearly dependent on the number of nodes in sparse networks. We finally give demonstrative applications of the algorithm to a set of synthetic benchmark networks and also real-world networks to verify the algorithmic performance.

Knowledge Based Systems | 2013

A fast parallel modularity optimization algorithm (FPMQA) for community detection in online social network

Zhan Bu; Chengcui Zhang; Zhengyou Xia; Jiandong Wang

As information technology has advanced, people are turning more frequently to electronic media for communication, and social relationships are increasingly found in online channels. Discovering the latent communities therein is a useful way to better understand the properties of a virtual social network. Traditional community-detection tasks only consider the structural characteristics of a social organization, but more information about nodes and edges such as semantic information cannot be exploited. What is more, the typical size of virtual spaces is now counted in millions, if not billions, of nodes and edges, most existing algorithms are incapable to analyze such large scale dense networks. In this paper, we first introduce an interesting social network model (Interest Network) in which links between two IDs are built if they both participate to the discussions about one or more topics/stories. In this case, we say both of the connected two IDs have the similar interests. Then, the edges of the initial network are updated using the attitude consistency information of the connected ID pairs. For a given ID pair i and j, they may together reply to some topics/IDs. The implicit orientations/attitudes of these two IDs to their together-reply topics/IDs may not be the same. We use a simple statistical method to calculate the attitude consistency, the value of which is between 0 and 1, and the higher value corresponds to a greater degree of consistency of the given ID pair to topics/IDs. The updated network is called Similar-View Network (SVN). In the second part, a fast parallel modularity optimization algorithm (FPMQA) that performs the analogous greedy optimization as CNM and FUC is used to conduct community discovering. By using the parallel manner and sophisticated data structures, its running time is essentially fast, O(k^m^a^x(k^m^a^x+logk^m^a^x)). Finally, we propose an evaluation metric, which is based on the reliable ground truths, for online network community detection. In the experimental work, we evaluate our method using real datasets and compare our approach with several previous methods; the results show that our method is more effective and accurate in find potential online communities.

IEEE Transactions on Systems, Man, and Cybernetics | 2016

Local Community Mining on Distributed and Dynamic Networks From a Multiagent Perspective.

Zhan Bu; Zhiang Wu; Jie Cao; Yichuan Jiang

Distributed and dynamic networks are ubiquitous in many real-world applications. Due to the huge-scale, decentralized, and dynamic characteristics, the global topological view is either too hard to obtain or even not available. So, most existing community detection methods working on the global view fail to handle such decentralized and dynamic large networks. In this paper, we propose a novel autonomy-oriented computing-based method for community mining (AOCCM) from the multiagent perspective in the distributed environment. In particular, AOCCM utilizes reactive agents to pick the neighborhood node with the largest structural similarity as the candidate node, and thus determine whether it should be added into local community based on the modularity gain. We further improve AOCCM to a more efficient incremental version named AOCCM-i for mining communities from dynamic networks. AOCCM and AOCCM-i can be easily expanded to detect both nonoverlapping and overlapping global community structures. Experimental results on real-life networks demonstrate that the proposed methods can reduce the computational cost by avoiding repeated structural similarity calculation and can still obtain the high-quality communities.

Information Fusion | 2017

CAMAS: A cluster-aware multiagent system for attributed graph clustering

Zhan Bu; Guangliang Gao; Hui-Jia Li; Jie Cao

Abstract Attributed graphs describe nodes via attribute vectors and also relationships between different nodes via edges. To partition nodes into clusters with tighter correlations, an effective way is applying clustering techniques on attributed graphs based on various criteria such as node connectivity and/or attribute similarity. Even though clusters typically form around nodes with tight edges and similar attributes, existing methods have only focused on one of these two data modalities. In this paper, we comprehend each node as an autonomous agent and develop an accurate and scalable multiagent system for extracting overlapping clusters in attributed graphs. First, a kernel function with a tunable bandwidth factor δ is introduced to measure the influence of each agent, and those agents with highest local influence can be viewed as the “leader” agents. Then, a novel local expansion strategy is proposed, which can be applied by each leader agent to absorb the most relevant followers in the graph. Finally, we design the cluster-aware multiagent system (CAMAS), in which agents communicate with each other freely under an efficient communication mechanism. Using the proposed multiagent system, we are able to uncover the optimal overlapping cluster configuration, i.e. nodes within one cluster are not only connected closely with each other but also with similar attributes. Our method is highly efficient, and the computational time is shown that nearly linearly dependent on the number of edges when δ ∈ [0.5, 1). Finally, applications of the proposed method on a variety of synthetic benchmark graphs and real-life attributed graphs are demonstrated to verify the systematic performance.

Information Systems Frontiers | 2014

An FAR-SW based approach for webpage information extraction

Zhan Bu; Chengcui Zhang; Zhengyou Xia; Jiandong Wang

Automatically identifying and extracting the target information of a webpage, especially main text, is a critical task in many web content analysis applications, such as information retrieval and automated screen reading. However, compared with typical plain texts, the structures of information on the web are extremely complex and have no single fixed template or layout. On the other hand, the amount of presentation elements on web pages, such as dynamic navigational menus, flashing logos, and a multitude of ad blocks, has increased rapidly in the past decade. In this paper, we have proposed a statistics-based approach that integrates the concept of fuzzy association rules (FAR) with that of sliding window (SW) to efficiently extract the main text content from web pages. Our approach involves two separate stages. In Stage 1, the original HTML source is pre-processed and features are extracted for every line of text; then, a supervised learning is performed to detect fuzzy association rules in training web pages. In Stage 2, necessary HTML source preprocessing and text line feature extraction are conducted the same way as that of Stage 1, after which each text line is tested whether it belongs to the main text by extracted fuzzy association rules. Next, a sliding window is applied to segment the web page into several potential topical blocks. Finally, a simple selection algorithm is utilized to select those important blocks that are then united as the detected topical region (main texts). Experimental results on real world data show that the efficiency and accuracy of our approach are better than existing Document Object Model (DOM)-based and Vision-based approaches.

Online Information Review | 2016

Discovering shilling groups in a real e-commerce platform

Youquan Wang; Zhiang Wu; Zhan Bu; Jie Cao; Dun Yang

Purpose – With the popularity of e-commerce, shilling attack is becoming more rampant in online shopping websites. Shilling attackers publish mendacious ratings as well as reviews for promoting or suppressing target products. The purpose of this paper is to investigate group shilling, a new typed shilling attack, behavior in a real e-commerce platform (e.g. Amazon.cn). Design/methodology/approach – Several behavioral features are proposed for modeling the shilling group, and thus an unsupervised ranking method based on principal component analysis (PCA) is presented for identifying shilling groups from real users on Amazon.cn. Findings – As indicated by the behavior analysis, the proposed method has successfully identified a number of shilling groups on Amazon. Meanwhile, the effectiveness of the proposed features and accuracy of the proposed unsupervised method are carefully validated. Originality/value – This paper presents a set of solutions for discovering shilling groups when the ground truth labels ...

Cluster Computing | 2016

SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

Zhiang Wu; Guangliang Gao; Zhan Bu; Jie Cao

Community detection is a classic and very difficult task in complex network analysis. As the increasingly explosion of social media, scaling community detection methods to large networks has attracted considerable recent interests. In this paper, we propose a novel SIMPLifying and Ensembling (SIMPLE) framework for parallel community detection. It employs the random link sampling to simplify the network and obtain basic partitionings on every sampled graphs. Then, the K-means-based Consensus Clustering is used to ensemble a number of basic partitionings to get high-quality community structures. All of phases in SIMPLE, including random sampling, sampled graph partitioning, and consensus clustering, are encapsulated into MapReduce for parallel execution. Experiments on six real-world social networks analyze key parameters and factors inside SIMPLE, and demonstrate both effectiveness and efficiency of the SIMPLE.

Archive | 2015

Discovering Communities in Multi-relational Networks

Zhiang Wu; Zhan Bu; Jie Cao; Yi Zhuang

Multi-relational networks (in short as MRNs) refer to such networks including one-typed nodes but associated with each other in poly-relations. MRNs are prevalent in the real world. For example, interactions in social networks include various kinds of information diffusion: email exchange, instant messaging services and so on. Community detection is a long-standing yet very difficult task in social network analysis, especially when meeting MRNs. This chapter gradually explores the research into discovering communities from MRNs. It begins by introducing the generalized modularity of the MRN, which paves the way for applying modularity optimization-based community detection methods on MRNs. However, the mainstream methods for discovering communities on MRNs are to integrate information from multiple dimensions. The existing integration methods fall into four categories: network integration, utility integration, feature integration, and partition integration. Learning or ranking the weight for each relation in MRN constitutes building blocks of network, utility and feature integrations. Thus, we turn our attention into several co-ranking frameworks on MRNs. We then discuss two different kinds of partition integration strategies, including the frequent pattern mining based method and the consensus clustering based method. Finally, for the purpose of conducting performance validation, we present several techniques for constructing the MRN based on both multivariate data and forum data.

Knowledge and Information Systems | 2018

GLEAM: a graph clustering framework based on potential game optimization for large-scale social networks

Zhan Bu; Jie Cao; Hui-Jia Li; Guangliang Gao; Haicheng Tao

With the growing explosion of online social networks, the study of large-scale graph clustering has attracted considerable interest. Most of traditional methods view the graph clustering problem as an optimization problem based on a given objective function; however, there are few methodical theories for the emergence of clusters over real-life networks. In this paper, each actor in online social networks is viewed as a selfish player in a non-cooperative game. The strategy associated with each node is defined as the cluster membership vector, and each one’s incentive is to maximize its own social identity by adopting the most suitable strategy. The definition of utility function in our game model is inspired by the conformity psychology, which is defined as the weighted average of one’s social identity by participating different clusters. With this setting, the proposed game can well match a potential game. So that the cluster could be shaped by the actions of those closely interactive users who adopt the same strategy in a Nash equilibrium. To this end, we propose a novel Graph cLustering framework based on potEntial gAme optiMization (GLEAM) for parallel graph clustering. It first utilize the cosine similarity to weight each edge in the original network. Then, an initial partition, including a number of clusters dominated by those potential “leader nodes”, is created by a fast heuristic process. Third, a potential game-based weighted Modularity optimization is used to improve the initial partition. Finally, we introduce the notion of potentially attractive cluster, and then discover the overlapping partition of the graph using a simple double-threshold procedure. Three phases in GLEAM are carefully designed for parallel execution. Experiments on real-world networks analyze the convergence inside GLEAM, and demonstrate the high performance of GLEAM by comparing it with the state-of-the-art community detection approaches in the literature.

Journal of Computational Science | 2017

A pattern-based topic detection and analysis system on Chinese tweets

Lu Zhang; Zhiang Wu; Zhan Bu; Ye Jiang; Jie Cao

Abstract Online social media is able to convey rich and timely information about real-world events. Uncovering events on social media and sensing topics from them can acquire much valuable information, which has attracted significant research effort. However, due to the large scale of data, to detect events or topics in real time is still a challenging problem. In this paper, we propose a Pattern-based Topic Detection and Analysis System (PTDAS) on Weibo, a Twitter-like platform in China. As one of the key components of the whole system, a FP-growth-like algorithm is employed to mine cosine interesting patterns from a set of tweets, and then summarize them as topics. Specially, in order to discover topics in real-time, we parallelize the algorithm on Spark for efficient mining. Along with pattern-based topic detection, we also present some analytic techniques, including both topic evolving analysis and sentimental analysis. Extensive experiments on the real-world data set demonstrate the effectiveness and efficiency of PTDAS.

Explore More