Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jiangjiao Duan is active.

Publication


Featured researches published by Jiangjiao Duan.


Expert Systems With Applications | 2009

A prediction algorithm for time series based on adaptive model selection

Jiangjiao Duan; Wei Wang; Jianping Zeng; Dongzhan Zhang; Baile Shi

HMM (Hidden Markov model) has been used successfully to analyze various types of time series. To fit time series with HMM, the number of hidden states should be determined before learning other parameters, since it has great impact on the complexity and precision of the fitting HMM. However this becomes too difficult when there is not enough prior knowledge about the observed series, which will lead to the increasing mean error in prediction process. To overcome this shortcoming, a prediction algorithm PAAMS for time series based on adaptive model selection is proposed. In PAAMS, the model can be dynamically updated when the prediction mean error increases. During the update process, an automatic model selection method AMSA is applied to get the best hidden state number and other model parameters. The proposed method AMSA is based on clustering, in which the number of hidden states is considered as the number of clusters. The feasibility and effectiveness of proposed prediction algorithm are explained. Experiments on American stock price data set are done and the results show that the PAAMS algorithm can achieve higher precision than that of previous study on the same data sets based on fixed model techniques.


Expert Systems With Applications | 2012

Topics modeling based on selective Zipf distribution

Jianping Zeng; Jiangjiao Duan; Wenjun Cao; Chengrong Wu

Automatically mining topics out of text corpus becomes an important fundament of many topic analysis tasks, such as opinion recognition, Web content classification, etc. Although large amount of topic models and topic mining methods have been proposed for different purposes and shown success in dealing with topic analysis tasks, it is desired to create accurate models or mining algorithms for many applications. A general criteria based on Zipf fitness quantity computation is proposed to determine whether a topic description is well-form or not. Based on the quantity definition, the popular Dirichlet prior on multinomial parameters is found that it cannot always produce well-form topic descriptions. Hence, topics modeling based on LDA with selective Zipf documents as training dataset is proposed to improve the quality in generation of topics description. Experiments on two standard text corpuses, i.e. AP dataset and Reuters-21578, show that the modeling method based on selective Zipf distribution can achieve better perplexity, which means better ability in predicting topics. While a test of topics extraction on a collection of news documents about recent financial crisis shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.


Expert Systems With Applications | 2010

A new distance measure for hidden Markov models

Jianping Zeng; Jiangjiao Duan; Chengrong Wu

Hidden Markov model (HMM) has been found useful in modeling complex time series in various applications. An appropriate distance measure between two HMMs is of theoretical interests and it is also important in HMM-based applications. Kullback-Leibler (KL) and modified KL are usually used as distance measures between two HMMs. However, these measures do not satisfy the necessary properties of a distance measure, such as the triangle inequality. A novel distance measure, which is based on the HMM stationary cumulative distribution function, is proposed to discriminate two HMMs. It is proved that the measure can fulfill the properties requirements. The distance measure is evaluated by making comparisons to KL distance in experiments on a series of models. Also clustering on both synthesized data and real world data is performed with the new distance and KL distance, respectively. The results show that the proposed distance is more effective and reasonable in discriminating HMMs.


Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014

Identification of Opinion Leaders Based on User Clustering and Sentiment Analysis

Jiangjiao Duan; Jianping Zeng; Banghui Luo

Opinion leaders play an important role in influencing topics of discussion among a group of persons. Hence, identification of opinion leaders has receive recent attention. Specifically, discovering opinion leaders in a Web-based stock message board might be valuable for many investors. Current methods for finding opinion leaders mainly concentrate on a graph of user connections, and thus leads to large amount of computation. On the other hand, opinions in user message are usually ignored so that the effectiveness in finding opinion leaders is very limited. In the paper, a new method is proposed to recognize opinion leaders in Web-based stock message boards. We combine clustering algorithm and sentiment analysis to address the two problems in current methods. Features of user activities are calculated based on messages posted on the board, then clustering algorithm is applied to the user data and generate clusters which contain potential opinion leaders. Next, we employ sentiment analysis to candidates and associate the sentiment with the actual price movement trend. By this means, opinion leaders can be well discovered since good ability in analyzing stock market is considered as skills of Influential users. Comparative experiments on a data set which contains real discussions and stock messages are conducted and the effectiveness of the proposed method is evaluated.


Expert Systems With Applications | 2013

Web objectionable text content detection using topic modeling technique

Jiangjiao Duan; Jianping Zeng

Web 2.0 technologies have made it easily for Web users to create and spread objectionable text content, which has been shown harmful to Web users, especially young children. Although detection methods based on key word list are superior in achieving faster detection and lower memory consumption, they fail to detect text content that is objectionable in semantic description. A framework that can perfectly integrate semantic model and detection method is proposed to perform probability inference for detecting this kind of Web text content. Based on the observation that an objectionable scene could be described by a set of sentences, a topic model which is learnt from the set is employed to act as a semantic model of the objectionable scene. For a given sentence, probability value which shows the likelihood of the sentence with respect to the model is calculated in the framework. Then we use a mapping function to transform the probability value into a new indicator which is convenient for making final decision. Extensive comparison experiments on two real world text sets show that the framework can effectively recognize semantic objectionable text, and both the detection rate and the false alarm rate are superior to those of traditional methods.


Expert Systems With Applications | 2011

Semantic multi-grain mixture topic model for text analysis

Jianping Zeng; Jiangjiao Duan; Wei Wang; Chengrong Wu

Granular topic extraction and modeling are fundament tasks in text analysis. Hierarchical topic clustering algorithms and hierarchical topic models are usually employed for these purposes. However, it is difficult to make a clear distinguish between each pair of hierarchical topics from the semantic granularity point of view. STG (semantic topic granularity) is proposed to indicate the details degree of topic description, and aim at providing discrimination for topics from semantic aspect. A new model, mgMTM (multi-grain mixture topic model) based on STG is then proposed to model grain topics. DCT (discrete cosine transform) is employed to provide a mechanism for computing STG, extracting grain topics and learning mgMTM. Experiments on real world datasets show that the proposed model has lower perplexity score than that of LDA model and thus has better generalization performance in describing text. Experiments also show that the description of the extracted grain topics can be well explained with respect to a dataset including topics about recent global financial crisis.


fuzzy systems and knowledge discovery | 2013

Mining opinion and sentiment for stock return prediction based on Web-forum messages

Jiangjiao Duan; Jianping Zeng

Stock return prediction has drawn extensive attention in recent years. All kinds of time series-based methods are commonly utilized to predict future stock returns based on the statistical properties in the series. As more and more people gather in Web-based forums, sentiment and opinion in forums are likely turned into new indices for the movement of stock returns. We propose a novel method to forecast stock returns by mining opinion and sentiment from Web forum messages. Opinion about the drop and rise of stock prices is firstly extracted from the messages posted by forum users. Then unhealthy sentiment is recognized by means of pattern matching. A Bayesian model that incorporates opinion and unhealthy sentiment is established to infer the relation between stock returns and the combination of opinion and sentiment. Compared experiments on China A-share stock market and Guba Web forum are done, and the results show that the proposed method is effective.


international conference on anti-counterfeiting, security, and identification | 2012

Hierarchical semantic model for objectionable Web text content detection

Jiangjiao Duan; Jianping Zeng; Shiyong Zhang

Objectionable Web text content becomes popular in many web sites on the Internet recently. Since it has been shown that the kind of text content is very harmful to young children, several measures have been taken to detect the objectionable text content. Unlike current methods, a scene-based method is proposed to recognize the objectionable text with aim at improving the performance, especially in the semantic detection. A scene which is defined by a set of sentences is assigned as the topics of objectionable content. Then, a hierarchical semantic model that can describe the scene from different granularity is learnt from the sentence set. Objectionable Web text detection is performed based on the similarity between the text and the model. Experiments are done on real world text sets which come from Web forums, and the results show that the proposed method can achieve better performance than that of keyword-based method with semantic feature selection. The ability in detecting semantic objectionable text is studied by varying several key parameters of the model.


intelligence and security informatics | 2011

Topic discovery based on dual EM merging

Jianping Zeng; Jiangjiao Duan; Chengrong Wu

Facing the enormous text on the Internet, automatic topic discovery out of large text corpus becomes an important task for advanced intelligence information analysis, such as opinion recognition, Web user interest analysis, etc. Although many topic mining methods have shown great success in dealing with topic-based analysis tasks, it is desired to discover meaningful topic descriptions for informatics analysis. To avoid words with different granularity to explain a topic, a mechanism for separating text corpus into two subsets with equal semantic topics is proposed. EM algorithm is employed to infer topics models for the subsets. Then a merging process is devised to generate topic descriptions based on the output of EM. Experiments on standard AP text corpus shows that the proposed topic discovery method can achieve better perplexity, which means better ability in predicting topics. Furthermore, a test of topics extraction on a collection of news documents about recent Expo 2010 Shanghai China shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.


fuzzy systems and knowledge discovery | 2009

A Method for Determination on HMM Distance Threshold

Jiangjiao Duan; Jianping Zeng; Dongzhan Zhang

Hidden Markov model (HMM) is widely used in time series modeling. Usually, it is necessarily to calculate the sequence’s likelihood w.r.t. HMM to evaluate the similarity between the sequence and the HMM. Hence, it is required to provide a method to select a best threshold value that can determine whether the sequence is well approximated by the model or not. However, this process is usually done manually. Here, we provide a method (HTDM) to determine the threshold automatically. Based on likelihood statistic, we conclude that the likelihood is subjected to normal distribution, and then standard deviation of the distribution is estimated. Hence, the distance threshold value can be achieved based on the rule of “three sigma”. In the experiment, we make performance comparison between the HMM-based hierarchical clustering algorithm HHCH using HTDM, and algorithm HBHCTS in which threshold is set by manual. Experiment results show that the proposed method is effective on both syntax dataset and real world dataset.

Collaboration


Dive into the Jiangjiao Duan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge