Peifa Jia
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peifa Jia.
web search and data mining | 2011
Zhongwu Zhai; Bing Liu; Hua Xu; Peifa Jia
In sentiment analysis of product reviews, one important problem is to produce a summary of opinions based on product features/attributes (also called aspects). However, for the same feature, people can express it with many different words or phrases. To produce a useful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. Although several methods have been proposed to extract product features from reviews, limited work has been done on clustering or grouping of synonym features. This paper focuses on this task. Classic methods for solving this problem are based on unsupervised learning using some forms of distributional similarity. However, we found that these methods do not do well. We then model it as a semi-supervised learning problem. Lexical characteristics of the problem are exploited to automatically identify some labeled examples. Empirical evaluation shows that the proposed method outperforms existing state-of-the-art methods by a large margin.
knowledge discovery and data mining | 2011
Zhongwu Zhai; Bing Liu; Hua Xu; Peifa Jia
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some pre-existing knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called Latent Dirichlet Allocation (LDA), with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrained-LDA and the extracted constraints are applied to group product features. Experiments show that constrained-LDA outperforms the original LDA and the latest mLSA by a large margin.
Expert Systems With Applications | 2011
Zhongwu Zhai; Hua Xu; Bada Kang; Peifa Jia
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths.
Expert Systems With Applications | 2010
Qinghua Wen; Zehong Yang; Yixu Song; Peifa Jia
The stock market is considered as a high complex and dynamic system with noisy, non-stationary and chaotic data series. So it is widely acknowledged that stock price series modeling and forecasting is a challenging work. A significant amount of work has been done in this field, and in them, soft computing techniques have showed good performance. Generally most of these works can be divided into two categories. One is to predict the future trend or price; another is to construct decision support system which can give certain buy/sell signals. In this paper, we propose a new intelligent trading system based on oscillation box prediction by combining stock box theory and support vector machine algorithm. The box theory believes a successful stock buying/selling generally occurs when the price effectively breaks out the original oscillation box into another new box. In the system, two SVM estimators are first utilized to make forecasts of the upper bound and lower bound of the price oscillation box. Then a trading strategy based on the two bound forecasts is constructed to make trading decisions. In the experiment, we test the system on different stock movement patterns, i.e. bull, bear and fluctuant market, and investigate the training of the system and the choice of the time span of the price box. The experiments on 442 S&P500 components show a promising performance is achieved and the system dramatically outperforms buy-and-hold strategy.
web intelligence | 2008
Zhongwu Zhai; Hua Xu; Peifa Jia
Opinion leaders play a very important role in information diffusion; they are found in all fields of society and influence the opinions of the masses in their fields. Most proposed algorithms on identifying opinion leaders in Internet social network are global measure algorithms and usually omit the fact that opinion leaders are field-limited. We propose and test several algorithms, including interest-field based algorithms and global measure algorithms, to identify opinion leaders in BBS. Our experiments show that different algorithms are sensitive to different indicators; the interest-field based algorithms which not only take into account of the social networkspsila structure but also the userspsila interest space are more reasonable and effective in identifying opinion leaders in BBS. The interest-field based algorithms are sensitive to the high status nodes in the social network, and their performance relies on the quality of field discovery.
Applied Soft Computing | 2011
Jiadong Yang; Hua Xu; Li Pan; Peifa Jia; Fei Long; Ming Jie
Abstract: Efficient task scheduling, as a crucial step to achieve high performance for multiprocessor platforms, remains one of the challenge problems despite of numerous studies. This paper presents a novel scheduling algorithm based on the Bayesian optimization algorithm (BOA) for heterogeneous computing environments. In the proposed algorithm, scheduling is divided into two phases. First, according to the task graph of multiprocessor scheduling problems, Bayesian networks are initialized and learned to capture the dependencies between different tasks. And the promising solutions assigning tasks to different processors are generated by sampling the Bayesian network. Second, the execution sequence of tasks on the same processor is set by the heuristic-based priority used in the list scheduling approach. The proposed algorithm is evaluated and compared with the related approaches by means of the empirical studies on random task graphs and benchmark applications. The experimental results show that the proposed algorithm is able to deliver more efficient schedules. Further experiments indicate that the proposed algorithm maintains almost the same performance with different parameter settings.
medical image computing and computer assisted intervention | 2008
Weiming Zhai; Jing Xu; Yannan Zhao; Yixu Song; Lin Sheng; Peifa Jia
A novel preoperative surgery planning method is proposed for percutaneous hepatic microwave ablation. An iterative framework for necrosis field simulation and 3D necrosis zone reconstruction is introduced here, and the necrosis model is further superimposed to patient anatomy structures using advanced GPU-accelerated visualization techniques. The full surgery planning is performed by the surgeon in an interactively way, until the optimal surgery plan is achieved. Experiments have been performed on realistic patient with hepatic cancer and the actual necrosis zone are measured in postoperative CT images for patient. Results show that this method is relative accurate for preoperative trajectory plan and could be used as an assistant to the clinical practice.
data integration in the life sciences | 2007
Yu Zhang; Zhidong Deng; Hongshan Jiang; Peifa Jia
Using our dynamic Bayesian network with structural Expectation Maximization (SEM-DBN), we develop a new framework to model gene regulatory network from both gene expression data and transcriptional factor binding site data. Only based on mRNA expression data, it is not enough to accurately estimate a gene network. It is difficult for us to estimate a gene network accurately only with the mRNA expression data. In this paper, we use the transcription factor binding location data in order to introduce the prior knowledge to SEM-DBN model. Gene expression data are also exploited specifically for likelihood. Meanwhile, we incorporate the prior knowledge into every learning step by SEM rather than only learning from the very beginning, which can compensate the attenuation of the effect with location data. The effectiveness of our proposed method is demonstrated through the analysis of Saccharomyces cerevisiae cell cycle data. The combination of heterogeneous data from multiple sources ensures that our results are more accurate than those recovered from only gene expression data alone.
genetic and evolutionary computation conference | 2007
Yunpeng Cai; Xiaomin Sun; Hua Xu; Peifa Jia
This paper deals with the adaptive variance scaling issue incontinuous Estimation of Distribution Algorithms. A phenomenon is discovered that current adaptive variance scaling method in EDA suffers from imprecise structure learning. A new type of adaptation method is proposed to overcome this defect. The method tries to measure the difference between the obtained population and the prediction of the probabilistic model, then calculate the scaling factor by minimizing the cross entropy between these two distributions. This approach calculates the scaling factor immediately rather than adapts it incrementally. Experiments show that this approach extended the class of problems that can be solved, and improve the search efficiency in some cases. Moreover, the proposed approach features in that each decomposed subspace can be assigned an individual scaling factor, which helps to solve problems with special dimension property.
Information Sciences | 2012
Jiadong Yang; Hua Xu; Peifa Jia
Pittsburgh-style learning classifier systems (LCSs), in which an entire candidate solution is represented as a set of variable number of rules, combine supervised learning with genetic algorithms (GAs) to evolve rule-based classification models. It has been shown that standard crossover operators in GAs do not guarantee an effective evolutionary search in many sophisticated problems that contain strong interactions between features. In this paper, we propose a Pittsburgh-style learning classifier system based on the Bayesian optimization algorithm with the aim of improving the effectiveness and efficiency of the rule structure exploration. In the proposed method, classifiers are generated and recombined at two levels. At the lower level, single rules contained in classifiers are produced by sampling Bayesian networks which characterize the global statistical information extracted from the current promising rules in the search space. At the higher level, classifiers are recombined by rule-wise uniform crossover operators to keep the semantics of rules in each classifier. Experimental studies on both artificial and real world binary classification problems show that the proposed method converges faster while achieving solutions with the same or even higher accuracy compared with the original Pittsburgh-style LCSs.