Shengzhong Feng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shengzhong Feng is active.

Explore More

Publication

Featured researches published by Shengzhong Feng.

Knowledge Based Systems | 2012

Topic oriented community detection through social objects and link analysis in social networks

Zhongying Zhao; Shengzhong Feng; Qiang Wang; Joshua Zhexue Huang; Graham J. Williams; Jianping Fan

Community detection is an important issue in social network analysis. Most existing methods detect communities through analyzing the linkage of the network. The drawback is that each community identified by those methods can only reflect the strength of connections, but it cannot reflect the semantics such as the interesting topics shared by people. To address this problem, we propose a topic oriented community detection approach which combines both social objects clustering and link analysis. We first use a subspace clustering algorithm to group all the social objects into topics. Then we divide the members that are involved in those social objects into topical clusters, each corresponding to a distinct topic. In order to differentiate the strength of connections, we perform a link analysis on each topical cluster to detect the topical communities. Experiments on real data sets have shown that our approach was able to identify more meaningful communities. The quantitative evaluation indicated that our approach can achieve a better performance when the topics are at least as important as the links to the analysis.

Knowledge Based Systems | 2013

Feature selection via maximizing global information gain for text classification

Changxing Shang; Min Li; Shengzhong Feng; Qingshan Jiang; Jianping Fan

A novel feature selection metric called global information gain (GIG) is proposed.An efficient algorithm called maximizing global information gain (MGIG) is developed.MGIG performs better than other algorithms (IG, mRMR, JMI, DISR) in most cases.MGIG runs significantly faster than mRMR, JMI and DISR, and comparable with IG. Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one features predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain.

Information Sciences | 2014

Quick attribute reduction in inconsistent decision tables

Min Li; Changxing Shang; Shengzhong Feng; Jianping Fan

This paper focuses on three types of attribute reducts in inconsistent decision tables: assignment reduct, distribution reduct, and maximum distribution reduct. It is quite inconvenient to judge these three types of reduct directly according to their definitions. This paper proposes judgment theorems for the assignment reduct, the distribution reduct and the maximum distribution reduct, which are expected to greatly simplify the judging of these three types of reducts. On this basis, we derive three new types of attribute significance measures and construct the Q-ARA (Quick Assignment Reduction Algorithm), the Q-DRA (Quick Distribution Reduction Algorithm), and the Q-MDRA (Quick Maximum Distribution Reduction Algorithm). These three algorithms correspond to the three types of reducts. We conduct a series of comparative experiments with twelve UCI (machine learning data repository, University of California at Irvine) data sets (including consistent and inconsistent decision tables) to evaluate the performance of the three reduction algorithms proposed with the relevant algorithm QuickReduct [9,34]. The experimental results show that QuickReduct possesses weak robustness because it cannot find the reduct even for consistent data sets, whereas our proposed three algorithms show strong robustness because they can find the reduct for each data set. In addition, we compare the Q-DRA (Quick Distribution Reduction Algorithm) with the CEBARKNC (conditional entropy-based algorithm for reduction of knowledge without a computing core) [43] because both find the distribution reduct by using a heuristic search. The experimental results demonstrate that Q-DRA runs faster than CEBARKNC does because the distribution function of Q-DRA has a lower calculation cost. Instructive conclusions for these reduction algorithms are drawn from the perspective of classification performance for the C4.5 and RBF-SVM classifiers. Last, we make a comparison between discernibility matrix-based methods and our algorithms. The experimental results indicate that our algorithms are efficient and feasible.

Knowledge Based Systems | 2014

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Min Li; Shaobo Deng; Lei Wang; Shengzhong Feng; Jianping Fan

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of this technique, with the concept of granularity, we derive a new clustering algorithm, MTMDP (Maximum Total Mean Distribution Precision), for categorical data. The MTMDP algorithm is a robust clustering algorithm that handles uncertainty in the process of clustering categorical data. We compare the MTMDP algorithm with the MMR (Min–Min–Roughness) algorithm which is the most relevant clustering algorithm, and also compared it with other unstable clustering algorithms, such as k-modes, fuzzy k-modes and fuzzy centroids. The experimental results indicate that the MTMDP algorithm can be successfully used to analyze grouped categorical data because it produces better clustering results.

Pattern Recognition Letters | 2011

An effective discretization based on Class-Attribute Coherence Maximization

Min Li; Shaobo Deng; Shengzhong Feng; Jianping Fan

Discretization of continuous data is one of the important pre-processing tasks in data mining and knowledge discovery. Generally speaking, discretization can lead to improved predictive accuracy of induction algorithms, and the obtained rules are normally shorter and more understandable. In this paper, we present the Class-Attribute Coherence Maximization (CACM) algorithm and the Efficient-CACM algorithm. We have compared the performance of our algorithms with the most relevant discretization algorithm, Fast Class-Attribute Interdependence Maximization (Fast-CAIM) discertization algorithm (Kurgan and Cios, 2003). Empirical evaluation of our algorithms and Fast-CAIM on 12 well-known datasets shows that ours generate the superior discretization scheme, which can significantly improve the classification performance of C4.5 and RBF-SVM classifier. As to the execution time of discretization, ours also prove faster than Fast-CAIM algorithm, with the Efficient-CACM algorithm having the shortest execution time.

Journal of Systems Engineering and Electronics | 2014

Fast assignment reduction in inconsistent incomplete decision systems

Min Li; Shaobo Deng; Shengzhong Feng; Jianping Fan

This paper focuses on fast algorithm for computing the assignment reduct in inconsistent incomplete decision systems. It is quite inconvenient to judge the assignment reduct directly according to its definition. We propose the judgment theorem for the assignment reduct in the inconsistent incomplete decision system, which greatly simplifies judging this type reduct. On such basis, we derive a novel attribute significance measure and construct the fast assignment reduction algorithm(FARA), intended for computing the assignment reduct in inconsistent incomplete decision systems. Finally, we make a comparison between FARA and the discernibility matrixbased method by experiments on 13 University of California at Irvine(UCI) datasets, and the experimental results prove that FARA is efficient and feasible.

international conference on data mining | 2010

Minimum Spanning Tree Based Classification Model for Massive Data with MapReduce Implementation

Jin Chang; Jun Luo; Joshua Zhexue Huang; Shengzhong Feng; Jianping Fan

Rapid growth of data has provided us with more information, yet challenges the tradition techniques to extract the useful knowledge. In this paper, we propose MCMM, a Minimum spanning tree (MST) based Classification model for Massive data with MapReduce implementation. It can be viewed as an intermediate model between the traditional K nearest neighbor method and cluster based classification method, aiming to overcome their disadvantages and cope with large amount of data. Our model is implemented on Hadoop platform, using its MapReduce programming framework, which is particular suitable for cloud computing. We have done experiments on several data sets including real world data from UCI repository and synthetic data, using Downing 4000 clusters, installed with Hadoop. The results show that our model outperforms KNN and some other classification methods on a general basis with respect to accuracy and scalability.

International Journal of Data Warehousing and Mining | 2015

Identifying and Analyzing Popular Phrases Multi-Dimensionally in Social Media Data

Zhongying Zhao; Chao Li; Yong Zhang; Joshua Zhexue Huang; Jun Luo; Shengzhong Feng; Jianping Fan

With the success of social media, social network analysis has become a very hot research topic and attracted much attention in the last decade. Most studies focus on analyzing the whole network from the perspective of topology or contents. However, there is still no systematic model proposed for multi-dimensional analysis on big social media data. Furthermore, little work has been done on identifying emerging new popular phrases and analyzing them multi-dimensionally. In this paper, the authors first propose an interactive systematic framework. In order to detect the emerging new popular phrases effectively and efficiently, they present an N-Pat Tree model and give some filtering mechanisms. They also propose an algorithm to find and analyze new popular phrases multi-dimensionally. The experiments on one-year Tencent-Microblogs data have demonstrated the effectiveness of their work and shown many meaningful results.

Journal of Computers | 2010

Personalized Knowledge Acquisition through Interactive Data Analysis in E-learning System

Zhongying Zhao; Shengzhong Feng; Qingtian Zeng; Jianping Fan; Xiaohong Zhang

Personalized knowledge acquisition is very important for promoting learning efficiency within E-learning system. To achieve this, two key problems involved are acquiring user’s knowledge requirements and discovering the people that can meet the requirements. In this paper, we present two approaches to realize personalized knowledge acquisition. The first approach aims to mine what knowledge the student requires and to what degree. All the interactive logs, accumulated during question answering process, are taken into account to compute each student’s knowledge requirement. The second approach is to construct and analyze user network based on the interactive data, which aims to find potential contributors list. Each student’s potential contributors may satisfy his/her requirement timely and accurately. Then we design an experiment to implement the two approaches. In order to evaluate the performance of our approaches, we make an evaluation with the percentage of satisfying recommendations. The evaluation results show that our approaches can help each student acquire the knowledge that he/she requires efficiently.

international workshop on education technology and computer science | 2009

Mining User's Interest from Interactive Behaviors in QA System

Zhongying Zhao; Shengzhong Feng; Yongquan Liang; Qingtian Zeng; Jianping Fan

User interest model, as a key component of user model, is very important for personalized or user adaptive E-learning systems. In this paper, we propose an approach for mining user’s interest from interactive behaviors. We also develop and implement a domain-specific interactive QA system oriented to Artificial Intelligence. The course ontology, predefined to describe the skeleton of AI course, is used to generate the structure of our interactive QA system. Students can pose and browse questions and answers on their favorite boards. The interactive behaviors, including whether student has pose a question, browsing and answering times, are considered to compute each student’s interest. The experiment conducted to evaluate the performance of our approach indicates that our method can capture user’s interest precisely.

Explore More