Chengrong Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chengrong Wu is active.

Explore More

Publication

Featured researches published by Chengrong Wu.

Knowledge Based Systems | 2008

A framework for WWW user activity analysis based on user interest

Jianping Zeng; Shiyong Zhang; Chengrong Wu

User activity plays an important role in the operation of many kinds of websites. Significant progress has been made on user activity modeling by researchers in many kinds of Web information process fields, such as Web log mining, blogs social network analysis, etc. However, research on user activity in many interactive websites, such as bbs and discussion group websites, has not attracted much attention. On the other hand, the integration of software modules which are responsible for user activity analysis is also a critical issue. We propose a framework which can be easily implemented for the analysis of user activity on an interactive website. In the framework, user activity model is represented by a hidden Markov model (HMM), and the method for user interest computation is provided. User activity analysis tasks, such as user group discovery, can be performed in the framework. In the experiments, we investigate three interesting problems which are related with user interest and user activity. Experiment results show that the framework is helpful for user activity analysis and the proposed user interest can describe user activity on an interactive website well, hence it can be used as an effective measure in analyzing user activity.

fuzzy systems and knowledge discovery | 2007

Predictive Model for Internet Public Opinion

Jianping Zeng; Shiyong Zhang; Chengrong Wu; Jianfeng Xie

The Internet is becoming a spreading platform for the public opinion. Its important to model the Internet public opinion activity as accurate as possible. Hidden Markov Model (HMM) is introduced to describe the activity of the Internet public opinion. The state of the Internet public opinion is represented as the hidden state of HMM, and its characteristic is considered as the visible states in HMM. Then the HMM is used to predict the characteristic changing of the Internet public opinion using two different algorithms under the circumstances of different amount of characteristic data gathered. Experiments on the Internet public opinion are done and the result shows that the modeling method and prediction algorithms are effective.

Expert Systems With Applications | 2012

Topics modeling based on selective Zipf distribution

Jianping Zeng; Jiangjiao Duan; Wenjun Cao; Chengrong Wu

Automatically mining topics out of text corpus becomes an important fundament of many topic analysis tasks, such as opinion recognition, Web content classification, etc. Although large amount of topic models and topic mining methods have been proposed for different purposes and shown success in dealing with topic analysis tasks, it is desired to create accurate models or mining algorithms for many applications. A general criteria based on Zipf fitness quantity computation is proposed to determine whether a topic description is well-form or not. Based on the quantity definition, the popular Dirichlet prior on multinomial parameters is found that it cannot always produce well-form topic descriptions. Hence, topics modeling based on LDA with selective Zipf documents as training dataset is proposed to improve the quality in generation of topics description. Experiments on two standard text corpuses, i.e. AP dataset and Reuters-21578, show that the modeling method based on selective Zipf distribution can achieve better perplexity, which means better ability in predicting topics. While a test of topics extraction on a collection of news documents about recent financial crisis shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.

Expert Systems With Applications | 2010

A new distance measure for hidden Markov models

Jianping Zeng; Jiangjiao Duan; Chengrong Wu

Hidden Markov model (HMM) has been found useful in modeling complex time series in various applications. An appropriate distance measure between two HMMs is of theoretical interests and it is also important in HMM-based applications. Kullback-Leibler (KL) and modified KL are usually used as distance measures between two HMMs. However, these measures do not satisfy the necessary properties of a distance measure, such as the triangle inequality. A novel distance measure, which is based on the HMM stationary cumulative distribution function, is proposed to discriminate two HMMs. It is proved that the measure can fulfill the properties requirements. The distance measure is evaluated by making comparisons to KL distance in experiments on a series of models. Also clustering on both synthesized data and real world data is performed with the new distance and KL distance, respectively. The results show that the proposed distance is more effective and reasonable in discriminating HMMs.

Expert Systems With Applications | 2010

Multi-grain hierarchical topic extraction algorithm for text mining

Jianping Zeng; Chengrong Wu; Wei Wang

Topic extraction from text corpus is the fundamental of many topic analysis tasks, such as topic trend prediction, opinion extraction. Since hierarchical structure is characteristics of topics, it is preferential for a topic extraction algorithm to output the topics description with this kind of structure. However, the hierarchical topic structure that is extracted by most of the current topic analysis algorithms cannot provide a meaningful description for all subtopics in the hierarchical tree. Here, we propose a new hierarchical topic extraction algorithm based on topic grain computation. By considering the distribution of word document frequency as a mixture Gaussian, an EM-like algorithm is employed to achieve the best number of mixture components, and the mean value of each component. Then topic grain is defined based on the mixture Gaussian parameters, and feature words are selected for the grain. A clustering algorithm is employed to the converted text set based on the feature words. After repeatedly applying the clustering algorithm to different converted text set, a multi-grain hierarchical topic structure with different subtopic feature words description is extracted. Experiments on two real world datasets which are collected from a news website show that the proposed algorithm can generate more meaningful multi-grain topic structure, by comparing with the current hierarchical topic clustering algorithms.

Expert Systems With Applications | 2010

Tag tree template for Web information and schema extraction

Xiangwen Ji; Jianping Zeng; Shiyong Zhang; Chengrong Wu

The process of information extraction from Web is both interesting and challenging, which could be helpful in Web Searching, Information Retrieval and Web Mining. Web pages on many sites are produced dynamically as structural records based on a HTML template from a background database. To efficiently extract meaningful information including records and data schema from the kind of pages, a new method based on Tag tree template is proposed. Web pages from different Web sites are parsed into Tag trees, and then templates of each site are generated from the trees by using a cost-based tree similarity measurement. The exclusive content in each page is then extracted by using the templates to parse the page. Finally, the records in pages and the schema of the records can be extracted from the exclusive content by finding repeating patterns and using some heuristic rules. The extraction experiments on 360 pages from 12 Web sites are performed, and the result shows that the proposed method is an effective way to extract meaningful information.

Mathematical and Computer Modelling of Dynamical Systems | 2009

Modelling topic propagation over the Internet

Jianping Zeng; Shiyong Zhang; Chengrong Wu; Xiangwen Ji

Because of the booming of the Internet, content security is becoming more intractable, because of the emergence of complex contents and the diversity in human activity on the Internet. The article proposes a model for the dynamics of topic propagation over the Internet. Topics on the Internet are considered as clusters of contents on websites, which describe various kinds of events. The model accounts for the behaviours of websites, such as anti-infection ability, recovery ability, spreading ability and effective propagation rate. A new topic diffusion mechanism incorporating Markov model based on topic activity transition is employed in the model. By means of simulations, we explore the time-dependent spreading of topics in directed scale-free networks, in which nodes are considered as websites and directed links represent the source dependencies between websites. The simulation results accord with the actual observation very well.

Expert Systems With Applications | 2011

Semantic multi-grain mixture topic model for text analysis

Jianping Zeng; Jiangjiao Duan; Wei Wang; Chengrong Wu

Granular topic extraction and modeling are fundament tasks in text analysis. Hierarchical topic clustering algorithms and hierarchical topic models are usually employed for these purposes. However, it is difficult to make a clear distinguish between each pair of hierarchical topics from the semantic granularity point of view. STG (semantic topic granularity) is proposed to indicate the details degree of topic description, and aim at providing discrimination for topics from semantic aspect. A new model, mgMTM (multi-grain mixture topic model) based on STG is then proposed to model grain topics. DCT (discrete cosine transform) is employed to provide a mechanism for computing STG, extracting grain topics and learning mgMTM. Experiments on real world datasets show that the proposed model has lower perplexity score than that of LDA model and thus has better generalization performance in describing text. Experiments also show that the description of the extracted grain topics can be well explained with respect to a dataset including topics about recent global financial crisis.

fuzzy systems and knowledge discovery | 2009

Hierarchical Clustering for Topic Analysis Based on Variable Feature Selection

Jianping Zeng; Linghui Gong; Qinqin Wang; Chengrong Wu

Hierarchical topic structure can express topics in a natural way which is more reasonable for human machine interface. However, the hierarchical topic structure that is extracted by most of the topic analysis algorithms can not present a meaningful description for all subtopics in the hierarchical tree. We propose a new hierarchical clustering algorithm based on variable feature selection for each level in the hierarchical structure. The algorithm employs a top-down strategy to extract subtopics and setups the relation between topics in neighbor levels based on common documents number. The number of the levels in the hierarchical structure is determined by the frequency of the selected word feature. Experiments on a real world dataset which is collected from a news website shows that the proposed algorithm can generate more meaningful topic structure, by comparing to the current hierarchical topic clustering algorithms.

intelligence and security informatics | 2011

Topic discovery based on dual EM merging

Jianping Zeng; Jiangjiao Duan; Chengrong Wu

Facing the enormous text on the Internet, automatic topic discovery out of large text corpus becomes an important task for advanced intelligence information analysis, such as opinion recognition, Web user interest analysis, etc. Although many topic mining methods have shown great success in dealing with topic-based analysis tasks, it is desired to discover meaningful topic descriptions for informatics analysis. To avoid words with different granularity to explain a topic, a mechanism for separating text corpus into two subsets with equal semantic topics is proposed. EM algorithm is employed to infer topics models for the subsets. Then a merging process is devised to generate topic descriptions based on the output of EM. Experiments on standard AP text corpus shows that the proposed topic discovery method can achieve better perplexity, which means better ability in predicting topics. Furthermore, a test of topics extraction on a collection of news documents about recent Expo 2010 Shanghai China shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.

Explore More