Jianping Fan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianping Fan is active.

Explore More

Publication

Featured researches published by Jianping Fan.

Knowledge Based Systems | 2012

Topic oriented community detection through social objects and link analysis in social networks

Zhongying Zhao; Shengzhong Feng; Qiang Wang; Joshua Zhexue Huang; Graham J. Williams; Jianping Fan

Community detection is an important issue in social network analysis. Most existing methods detect communities through analyzing the linkage of the network. The drawback is that each community identified by those methods can only reflect the strength of connections, but it cannot reflect the semantics such as the interesting topics shared by people. To address this problem, we propose a topic oriented community detection approach which combines both social objects clustering and link analysis. We first use a subspace clustering algorithm to group all the social objects into topics. Then we divide the members that are involved in those social objects into topical clusters, each corresponding to a distinct topic. In order to differentiate the strength of connections, we perform a link analysis on each topical cluster to detect the topical communities. Experiments on real data sets have shown that our approach was able to identify more meaningful communities. The quantitative evaluation indicated that our approach can achieve a better performance when the topics are at least as important as the links to the analysis.

Knowledge Based Systems | 2013

Feature selection via maximizing global information gain for text classification

Changxing Shang; Min Li; Shengzhong Feng; Qingshan Jiang; Jianping Fan

A novel feature selection metric called global information gain (GIG) is proposed.An efficient algorithm called maximizing global information gain (MGIG) is developed.MGIG performs better than other algorithms (IG, mRMR, JMI, DISR) in most cases.MGIG runs significantly faster than mRMR, JMI and DISR, and comparable with IG. Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one features predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain.

international symposium on parallel and distributed processing and applications | 2011

Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments

Xiaohong Zhang; Zhiyong Zhong; Shengzhong Feng; Bibo Tu; Jianping Fan

Data Locality is one of the critical factors to affect performance. This paper proposes a next-k-node scheduling (NKS) method to improve the data locality of map tasks. The method first calculates the probabilities of each map task, and then preferentially schedules the one with the highest probability. It generates low probabilities for the tasks which satisfy node locality with the nodes to issue requests, so it can reserve these tasks to these nodes. We have implemented the NKS method in hadoop-0.20.2. The experiment results have shown that the NKS method reduced 78% of the map tasks processed without node locality, reduced 77%of the network load caused by the tasks, and improved the performance of Hadoop MapReduce when comparing with the default task scheduling method in Hadoop. Obviously, the NKS method is very suitable for the homogeneous environment with network overload.

Information Sciences | 2014

Quick attribute reduction in inconsistent decision tables

Min Li; Changxing Shang; Shengzhong Feng; Jianping Fan

This paper focuses on three types of attribute reducts in inconsistent decision tables: assignment reduct, distribution reduct, and maximum distribution reduct. It is quite inconvenient to judge these three types of reduct directly according to their definitions. This paper proposes judgment theorems for the assignment reduct, the distribution reduct and the maximum distribution reduct, which are expected to greatly simplify the judging of these three types of reducts. On this basis, we derive three new types of attribute significance measures and construct the Q-ARA (Quick Assignment Reduction Algorithm), the Q-DRA (Quick Distribution Reduction Algorithm), and the Q-MDRA (Quick Maximum Distribution Reduction Algorithm). These three algorithms correspond to the three types of reducts. We conduct a series of comparative experiments with twelve UCI (machine learning data repository, University of California at Irvine) data sets (including consistent and inconsistent decision tables) to evaluate the performance of the three reduction algorithms proposed with the relevant algorithm QuickReduct [9,34]. The experimental results show that QuickReduct possesses weak robustness because it cannot find the reduct even for consistent data sets, whereas our proposed three algorithms show strong robustness because they can find the reduct for each data set. In addition, we compare the Q-DRA (Quick Distribution Reduction Algorithm) with the CEBARKNC (conditional entropy-based algorithm for reduction of knowledge without a computing core) [43] because both find the distribution reduct by using a heuristic search. The experimental results demonstrate that Q-DRA runs faster than CEBARKNC does because the distribution function of Q-DRA has a lower calculation cost. Instructive conclusions for these reduction algorithms are drawn from the perspective of classification performance for the C4.5 and RBF-SVM classifiers. Last, we make a comparison between discernibility matrix-based methods and our algorithms. The experimental results indicate that our algorithms are efficient and feasible.

The Journal of Supercomputing | 2012

Performance analysis and optimization of MPI collective operations on multi-core clusters

Bibo Tu; Jianping Fan; Jianfeng Zhan; Xiaofang Zhao

Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels. Experimental results show that new model can predict communication costs for message passing on multi-core clusters more accurately than previous models, only incorporated vertical memory hierarchy. The new model provides the theoretical underpinning for the optimal design of MPI collective operations. Aimed at horizontal memory hierarchy, our methodology for optimizing collective operations on multi-core clusters focuses on hierarchical virtual topology and cache-aware intra-node communication, incorporated into existing collective algorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has been implemented and evaluated. The results of performance evaluation show that the above methodology for optimizing collective operations on multi-core clusters is efficient.

Knowledge Based Systems | 2014

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Min Li; Shaobo Deng; Lei Wang; Shengzhong Feng; Jianping Fan

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of this technique, with the concept of granularity, we derive a new clustering algorithm, MTMDP (Maximum Total Mean Distribution Precision), for categorical data. The MTMDP algorithm is a robust clustering algorithm that handles uncertainty in the process of clustering categorical data. We compare the MTMDP algorithm with the MMR (Min–Min–Roughness) algorithm which is the most relevant clustering algorithm, and also compared it with other unstable clustering algorithms, such as k-modes, fuzzy k-modes and fuzzy centroids. The experimental results indicate that the MTMDP algorithm can be successfully used to analyze grouped categorical data because it produces better clustering results.

IEEE Transactions on Vehicular Technology | 2013

Detecting Crowdedness Spot in City Transportation

Siyuan Liu; Yunhuai Liu; Lionel Man-Shuan Ni; Minglu Li; Jianping Fan

Crowdedness spot is a crowded area with an abnormal number of objects. Detecting the crowdedness spots of moving vehicles in an urban area is essential to many applications. An intuitive method is to cluster the objects in areas to get the density information. Unfortunately, the data capturing vehicle mobility possesses some new features, such as highly mobile environments, supremely limited size of sample objects, and nonuniform biased samples, and all these features have raised new challenges that make traditional density-based clustering algorithms fail to retrieve the real clustering property of objects, making the results less meaningful. In this paper, we propose a novel nondensity-based approach called mobility-based clustering. The key idea is that sample objects are employed as “sensors” to perceive the vehicle crowdedness in nearby areas using their instant mobility rather than the “object representatives.” As such, the mobility of samples is naturally incorporated. Several key factors beyond the vehicle crowdedness have been identified, and techniques to compensate these effects are accordingly proposed. Furthermore, taking the detected crowdedness spots as a label of the taxi, we can identify one particular taxi to be a crowdedness taxi that crosses a number of different crowdedness spots. We evaluate the performance of our methods and baseline approaches based on real traffic situations (to retrieve the real traffic crowdedness) and real-life data sets. Finally, the interesting findings are provided for further discussions.

international conference on cluster computing | 2008

Multi-core aware optimization for MPI collectives

Bibo Tu; Ming Zou; Jianfeng Zhan; Xiaofang Zhao; Jianping Fan

MPI collective operations on multi-core clusters should be multi-core aware. In this paper, collective algorithms with hierarchical virtual topology focus on the performance difference among different communication levels on multi-core clusters, simply for intra-node and inter-node communication; Furthermore, to select befitting segment sizes for intra-node collective communication can cater to cache hierarchy in multi-core processors. Based on existing collective algorithms in MPICH2, above two techniques construct portable optimization methodology over MPICH2 for collective operations on multi-core clusters. Conforming to above optimization methodology, multi-core aware broadcast algorithm has been implemented and evaluated as a case study. The results of performance evaluation show that the multi-core aware optimization methodology over MPICH2 is efficient.

Pattern Recognition Letters | 2011

An effective discretization based on Class-Attribute Coherence Maximization

Min Li; Shaobo Deng; Shengzhong Feng; Jianping Fan

Discretization of continuous data is one of the important pre-processing tasks in data mining and knowledge discovery. Generally speaking, discretization can lead to improved predictive accuracy of induction algorithms, and the obtained rules are normally shorter and more understandable. In this paper, we present the Class-Attribute Coherence Maximization (CACM) algorithm and the Efficient-CACM algorithm. We have compared the performance of our algorithms with the most relevant discretization algorithm, Fast Class-Attribute Interdependence Maximization (Fast-CAIM) discertization algorithm (Kurgan and Cios, 2003). Empirical evaluation of our algorithms and Fast-CAIM on 12 well-known datasets shows that ours generate the superior discretization scheme, which can significantly improve the classification performance of C4.5 and RBF-SVM classifier. As to the execution time of discretization, ours also prove faster than Fast-CAIM algorithm, with the Efficient-CACM algorithm having the shortest execution time.

Journal of Systems Engineering and Electronics | 2014

Fast assignment reduction in inconsistent incomplete decision systems

Min Li; Shaobo Deng; Shengzhong Feng; Jianping Fan

This paper focuses on fast algorithm for computing the assignment reduct in inconsistent incomplete decision systems. It is quite inconvenient to judge the assignment reduct directly according to its definition. We propose the judgment theorem for the assignment reduct in the inconsistent incomplete decision system, which greatly simplifies judging this type reduct. On such basis, we derive a novel attribute significance measure and construct the fast assignment reduction algorithm(FARA), intended for computing the assignment reduct in inconsistent incomplete decision systems. Finally, we make a comparison between FARA and the discernibility matrixbased method by experiments on 13 University of California at Irvine(UCI) datasets, and the experimental results prove that FARA is efficient and feasible.

Explore More