Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wenqian Shang is active.

Publication


Featured researches published by Wenqian Shang.


Expert Systems With Applications | 2007

A novel feature selection algorithm for text categorization

Wenqian Shang; Houkuan Huang; Haibin Zhu; Yongmin Lin; Youli Qu; Zhihai Wang

With the development of the web, large numbers of documents are available on the Internet. Digital libraries, news sources and inner data of companies surge more and more. Automatic text categorization becomes more and more important for dealing with massive data. However the major problem of text categorization is the high dimensionality of the feature space. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present another method of dealing with text feature selection. Our study is based on Gini index theory and we design a novel Gini index algorithm to reduce the high dimensionality of the feature space. A new measure function of Gini index is constructed and made to fit text categorization. The results of experiments show that our improvements of Gini index behave better than other methods of feature selection.


computational intelligence and security | 2005

An improved kNN algorithm – fuzzy kNN

Wenqian Shang; Houkuan Huang; Haibin Zhu; Yongmin Lin; Zhihai Wang; Youli Qu

As a simple, effective and nonparametric classification method, kNN algorithm is widely used in text classification. However, there is an obvious problem: when the density of training data is uneven it may decrease the precision of classification if we only consider the sequence of first k nearest neighbors but do not consider the differences of distances. To solve this problem, we adopt the theory of fuzzy sets, constructing a new membership function based on document similarities. A comparison between the proposed method and other existing kNN methods is made by experiments. The experimental results show that the algorithm based on the theory of fuzzy sets (fkNN) can promote the precision and recall of text categorization to a certain degree.


international conference on computational science | 2006

An adaptive fuzzy kNN text classifier

Wenqian Shang; Houkuan Huang; Haibin Zhu; Yongmin Lin; Youli Qu; Hongbin Dong

In recent years, kNN algorithm is paid attention by many researchers and is proved one of the best text categorization algorithms. Text categorization is according to training set which is assigned class label to decide a new document which is not assigned class label belongs to some kind of document. Until now, kNN algorithm has still some issues to need to study further. Such as: improvement of decision rule; selection of k value; selection of dimensions (i.e. feature set selection); problems of multiclass text categorization; the algorithm’s executive efficiency (time and space) etc. In this paper, we mainly focus on improvement of decision rule and dimension selection. We design an adaptive fuzzy kNN text classifier. Here the adaptive indicate the adaptive of dimension selection. The experiment results show that our algorithm is effective and feasible.


computational sciences and optimization | 2012

Advanced Deep Web Crawler Based on Dom

Weicheng Ma; Xiuxia Chen; Wenqian Shang

Due to the fact that large amount of data today can only be stored in deep web. In view of the work done by others on deep web crawlers, it is extinct that no perfect, or even complete crawlers for deep web data has been made. To meet the needs of deep web search, we have worked out a new structure of crawler, currently concerned most on extracting data from forms - the most common type of deep web interface. Our crawlers makes some innovative parts such as the mainframe extracting module and the algorithm to distinguish different websites with the same url using improved Bayesian classification and to expand the function to AJAX form dealing and so on. Also, Dom Tree is used to make easier and more visual the analysis and treatment of downloaded web pages.


international symposium on computers and communications | 2006

An Adaptive Fuzzy kNN Text Classifier Based on Gini Index Weight

Wenqian Shang; Youli Qu; Haibin Zhu; Houkuan Huang; Yongmin Lin; Hongbin Dong

In recent years, kNN algorithm is paid attention by many researchers and is proved one of the best text categorization algorithms. Text categorization is according to training set, which is assigned class label to decide a new document, which is not assigned class label belongs to some kind of document. But for a classifier, text preprocessing is the bottleneck of categorization. In the original feature space, there are always thousands upon thousands words. The dimension of feature space is very high. So in this paper, we adopt a new feature weight method---- improved Gini index to reduce the dimension of feature space and improve the categorization precision. In addition, we discuss the improvement of decision rule and dimension selection. We design an adaptive fuzzy kNN text classifier. Here the adaptive indicate the adaptive of dimension selection. The experiment results show that our algorithm is effective and feasible.


computer science and information engineering | 2011

Naive Bayesian Classifier Based on the Improved Feature Weighting Algorithm

Tao Dong; Wenqian Shang; Haibin Zhu

Text categorization is a fundamental methodology of text mining and it is also a hot topic of the research of data mining and web mining in recent years. It plays an important role in building traditional information retrieval, web indexing architecture, Web information retrieval, and so on. This paper presents an improved algorithm of text categorization which combines a feature weighting technique with Naive Bayesian classification. Experimental results show that using the improved Gini index algorithm to feature weight can improve the performance of Naive Bayesian classifiers and increase the practical values of the sensitive information system.


computational sciences and optimization | 2011

The Key Technology Research of Intelligent Information Syndication

Wenqian Shang; Tong Wang; Rui Lv

With the development of network, the online information has greatly enriched. It is increasingly difficult to obtain the desired information. The search engine can solve some problems, but far from satisfying peoples needs. With the development of Web 2.0, RSS technology can solve some problems, greatly reducing spam, but as more and more channels appear, the user once again gets into the information overload situation. Therefore, in this paper, we use machine learning techniques and effective text classification techniques to solve the problem of information overload in a certain extent.


international conference natural language processing | 2008

A novel feature weight algorithm for text categorization

Wenqian Shang; Hongbin Dong; Haibin Zhu; Yongbin Wang

With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing directly affects the performance and precision of categorization. Moreover, feature selection and feature weight become the major obstacles of text preprocessing. In this paper, we mainly focus on feature weight. We present a novel feature weight algorithm----TF-Gini that can improve the categorization performance significantly. The experiment results verify the effectiveness of this algorithm.


systems, man and cybernetics | 2004

WebCom Miner - a system of trends analysis for company products

Wenqian Shang; Haibin Zhu; Houkuan Huang

With advances of the Web, e-commerce comes to flourish. More and more companies, governments and individuals publish their information on the Internet. How could a company survive in this virtual world? Its an urgent problem to solve. So we design a novel mining system - WebCom Miner to help the companys decision-maker to make a scientific decision-making. This work presents the architecture of the system, the function of every part of the system, along with the basic method. The aim is to support the decision-maker to make a proper decision-making. Each part of the system gives a report of trend analysis and the system gives a friendly interface and can communicate with the user. Even part of the system adopt different mining algorithm, ensuring the system to ruin in high efficiency and precision.


computational sciences and optimization | 2011

Identification of Sensitive Information Based on Improved Naive Bayesian Classifier

Tao Dong; Wenqian Shang

In order to purify the Internet environment, identify the unhealthy and malicious information from the mass network information and achieve the purpose of monitoring the websites efficiently, we use the text preprocessing based on the vector space model and the improved Naive Bayesian classifier to construct a identification system of sensitive information. This system not only identify and classify the sensitive information from the mass of network information, but also provide a practical system and program for monitoring the websites.

Collaboration


Dive into the Wenqian Shang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Houkuan Huang

Beijing Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Yongmin Lin

Beijing Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Youli Qu

Beijing Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Hongbin Dong

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar

Tao Dong

Communication University of China

View shared research outputs
Top Co-Authors

Avatar

Zhihai Wang

Beijing Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Hongjia Liu

Communication University of China

View shared research outputs
Top Co-Authors

Avatar

Ligu Zhu

Communication University of China

View shared research outputs
Top Co-Authors

Avatar

Liu Yang

Communication University of China

View shared research outputs
Researchain Logo
Decentralizing Knowledge