Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sen Wu is active.

Publication


Featured researches published by Sen Wu.


IEEE Transactions on Systems, Man, and Cybernetics | 2013

Understanding and Enhancement of Internal Clustering Validation Measures

Yanchi Liu; Zhongmou Li; Hui Xiong; Xuedong Gao; Junjie Wu; Sen Wu

Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications. In general, clustering validation can be categorized into two classes, external clustering validation and internal clustering validation. In this paper, we focus on internal clustering validation and present a study of 11 widely used internal clustering validation measures for crisp clustering. The results of this study indicate that these existing measures have certain limitations in different application scenarios. As an alternative choice, we propose a new internal clustering validation measure, named clustering validation index based on nearest neighbors (CVNN), which is based on the notion of nearest neighbors. This measure can dynamically select multiple objects as representatives for different clusters in different situations. Experimental results show that CVNN outperforms the existing measures on both synthetic data and real-world data in different application scenarios.


systems, man and cybernetics | 2012

Missing categorical data imputation approach based on similarity

Sen Wu; Xiaodong Feng; Yushan Han; Qiang Wang

Imputation for missing data is an important task of data mining, which may influence the data mining result. In this paper, Missing Categorical Data Imputation Based on Similarity (MIBOS) is proposed to solve this problem. The algorithm defines a similarity model between objects with incomplete data, constructing the similarity matrix of objects and further gets the nearest undifferentiated object sets of each object to impute the missing data iteratively. In the imputing process, the imputed value will be directly applied to the same iteration and the following iterations. Experiments with three UCI benchmark data sets show the improvement of the proposed algorithm from perspectives of complete rate, accuracy and time efficiency.


knowledge science engineering and management | 2011

Imputing missing values for mixed numeric and categorical attributes based on incomplete data hierarchical clustering

Xiaodong Feng; Sen Wu; Yanchi Liu

Missing data imputation is a key issue of data pre-processing in data mining field. Though there are many methods for missing value imputation, almost each of these imputation methods has its limitation and is designed for either numeric attributes or categorical attributes. This paper presents IMIC, a new missing value Imputation method for Mixed numeric and categorical attributes based on Incomplete data hierarchical clustering after the introduction of a new concept Incomplete Set Mixed Feature Vector (ISMFV). The effect of the new method is valuated through the comparison experiment using 3 real data sets from UCI.


international conference on logistics systems and intelligent management | 2010

High dimensional data Clustering Algorithm Based on Sparse Feature Vector for Categorical Attributes

Sen Wu; Guiying Wei

An algorithm is proposed to cluster high dimensional data named as Clustering Algorithm Based On Sparse Feature Vector for Categorical Attributes (CABOSFV_C). It compresses data effectively by using ‘Sparse Feature Vector of a Set for Categorical Data’ without losing the information necessary for making clustering decisions, and can get the clustering result with once data scan by defining ‘Sparse Feature Dissimilarity of a Set for Categorical Data’ as distance measure. Because of the data reduction and once data scan strategy the algorithm has almost linear computation complexity and handles noise effectively. In addition, CABOSFV_C is suitable not only for sparse data but also for complete data, which is illustrated by two numeric examples at the end of the paper as well as other salient features of the algorithm.


international conference on industrial and information systems | 2010

Study of text classification methods for data sets with huge features

Guiying Wei; Xuedong Gao; Sen Wu

Text classification has gained booming interest over the past few years. In this paper we look at the main approaches that have been taken towards text classification. The key text classification techniques including text model, feature selection methods and text classification algorithms are discussed .This work focus on the implementation of a text classification system based on Mutual Information and K-Nearest Neighbor algorithm and Support Vector Machine. The experimental results on Reuters collection are also presented. It shows that Mutual Information is a kind of efficient dimension reduction method for text data sets with huge features.


international conference on service operations and logistics, and informatics | 2008

High dimensional sparse data Clustering Algorithm Based on Concept Feature Vector (CABOCFV)

Sen Wu; Shujuan Gu; Xuedong Gao

Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed. This paper focuses on using concept lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data clustering algorithm based on concept feature vector (CABOCFV), which reduces the redundancy of concept construction using concept sparse feature distance and concept feature vector, and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.


knowledge science engineering and management | 2015

Distributed Recommendation Algorithm Based on Matrix Decomposition on MapReduce Framework

Sen Wu; Dan Lu; Yannan Du; Xiaodong Feng

This paper presents a recommendation algorithm based on matrix operations RAMO, which integrates collaborative filtering algorithm with information network-based approach. RAMO exploits information from different objects to increase the recommendation accuracy. Furthermore, a distributed recommendation algorithm DRAMD is proposed based on matrix decomposition using the framework MapReduce. DRAMD can be run across multiple cluster nodes to reduce the computation time. Test results on MovieLens dataset show that the algorithms not only have better recommendation effectiveness but improve the efficiency of the computation.


systems, man and cybernetics | 2011

An algorithm for detecting overlapping community structure in complex networks

Sen Wu; Yue Huang; Deying Xiong; Guiying Wei; Xuedong Gao

It is a general problem in the data mining field to detect the overlapping community structure in a complex network. A novel approach is presented in this paper to identify communities based on core vertices and shared neighbors, which can find both separate and overlapping community structures in networks. Experimental results on synthetic and real-world networks show that the new algorithm is effective and efficient in identification of communities and it can find overlapping and bridge vertices.


international conference on advanced computer control | 2011

Clustering algorithm based on Condensed Set Dissimilarity for high dimensional sparse data of categorical attributes

Sen Wu; Juanjuan Liu; Guiying Wei

Categorical data clustering is always challenging, especially when data is high dimensional and sparse. This paper proposes a new algorithm, named as CABOC, for clustering high dimensional sparse data with categorical attributes. Based on a new defined concept ‘Condensed Set Dissimilarity’, the algorithm computes the dissimilarity of all the objects with sparse categorical attributes in a set directly. Furthermore, the algorithm only records a Condensed Set Reduction vector of the set during the computation process, which is defined to simply and accurately represent the necessary information of all the objects with sparse categorical attributes in the set for the clustering. So the computational complexity of the algorithm is low. A numeric example for customer cluster analysis illustrates the effectiveness of the algorithm.


database technology and applications | 2010

Research on Ontology-Based Text Representation of Vector Space Model

Guiying Wei; Mingming Bao; Sen Wu

Collaboration


Dive into the Sen Wu's collaboration.

Top Co-Authors

Avatar

Guiying Wei

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Xuedong Gao

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Xiaodong Feng

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Dan Lu

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Deying Xiong

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Juanjuan Liu

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lei Zou

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Min Jiang

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Qiang Wang

University of Science and Technology Beijing

View shared research outputs
Researchain Logo
Decentralizing Knowledge