Is this you? Create Your Porfile

Yan-Xing Hu

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yan-Xing Hu is active.

Explore More

Publication

Featured researches published by Yan-Xing Hu.

Expert Systems With Applications | 2015

OWA operator based link prediction ensemble for social network

Yu-Lin He; James N. K. Liu; Yan-Xing Hu; Xizhao Wang

This paper firstly studied the link prediction ensemble for local information based algorithms.The integration of individual algorithms is finished via OWA operator.Experimental results reveal the better performances of our proposed link prediction ensemble algorithm. The objective of link prediction for social network is to estimate the likelihood that a link exists between two nodes. Although there are many local information-based algorithms which have been proposed to handle this essential problem in the social network analysis, the empirical observations show that the stability of local information-based algorithm is usually very low, i.e., the variabilities of local information-based algorithms are high. Thus, motivated by obtaining a stable link predictor with low variance, this paper proposes a kind of ordered weighted averaging (OWA) operator based link prediction ensemble algorithm (LPEOWA) for social network by assigning the aggregation weights for nine local information-based link prediction algorithms with three different OWA operators. The finally experimental results on benchmark social network datasets show that LPEOWA obtains a more stable prediction performance and considerably improves the prediction accuracy which is measured by the area under the receiver operating characteristic curve (AUC) in comparison with nine individual prediction algorithms.

Information Sciences | 2014

A set covering based approach to find the reduct of variable precision rough set

James N. K. Liu; Yan-Xing Hu

Abstract Attribute reduction is one of the core problems in Rough Set (RS) theory. In the Variable Precision Rough Set (VPRS) model, attribute reduction faces two difficulties: firstly, in the VPRS model, a reduct anomaly problem may arise and it may cause an inconsistency of positive regions and decision rules after attribute reduction. Secondly, the attribution reduction problem has been proved an NP-hard problem; accordingly, we would need to find a tradeoff between calculating the minimal reduct and reducing computing complexity to avoid the combinatorial explosion problem. We propose a new approach to calculate the reduct in VPRS model. This new method focuses on calculating a β -distribution reduct while avoiding the anomaly problem in the VPRS model. The basic idea of the proposed approach is to convert the reduct problem into a Set Covering Problem (SCP) according to the positive regions in the VPRS model; and consequently, a Set-Covering Heuristic Function (SCHF) algorithm is applied to calculate the reduct after this conversion. This approach keeps the positive regions consistent after the attribute reduction and moreover, based on the SCP, the performance ratio of the proposed method to calculate the minimal reduct ranges between ln ( U ′ ) - ln ln ( U ′ ) + o ( 1 ) and ( 1 - o ( 1 ) ) ln ( U ′ ) with a computational complexity having an upper bound as o ( MN ( M + N ) 2 ) . Finally, we demonstrate the practical application of the VPRS model using real case scenario from China’s electricity power yield to verify the validity of our proposed approach. We then apply statistical evaluation to explain the economic significance of the attributes.

Neural Computing and Applications | 2013

Application of feature-weighted Support Vector regression using grey correlation degree to stock price forecasting

James N. K. Liu; Yan-Xing Hu

A feature-weighted Support Vector Machine regression algorithm is introduced in this paper. We note that the classical SVM is based on the assumption that all the features of the sample points supply the same contribution to the target output value. However, this assumption is not always true in real problems. In the proposed new algorithm, we give different weight values to different features of the samples in order to improve the performance of SVM. In our algorithm, firstly, a measure named grey correlation degree is applied to evaluate the correlation between each feature and the target problem, and then the values of the grey correlation degree are used as weight values assigned to the features. The proposed method is tested on sample stock data sets selected from China Shenzhen A-share market. The result shows that the new version of SVM can improve the accuracy of the prediction.

Archive | 2015

Deep Neural Network Modeling for Big Data Weather Forecasting

James N. K. Liu; Yan-Xing Hu; Pak Wai Chan; Lucas K. C. Lai

The coming of the big data era brings the opportunities to greatly improve the forecasting accuracy of weather phenomena. Specifically, weather change is quite a complex process that is affected by thousands of variables. In the traditional computational intelligence models, we have to select the features from variables according to some fundamental assumptions, thus the correctness of these assumptions may crucially affect the prediction accuracy. Meanwhile, the principle of big data is to let data speaking, which means, when the volume of data is big enough, the hidden statistical disciplines in domain data will be revealed by the data set itself. Therefore, if massive volume of weather data is employed, we may be able to avoid using assumptions in the models, and we have the opportunity to improve the weather prediction accepted by learning the correlations hidden in the data. In our investigation, we employ a new computational intelligence technology called stacked Auto-Encoder to simulate hourly weather data in 30 years. This method can automatically learn the features from massive volume of data set via layer-by-layer feature granulation, and the large size of the data set can make sure that the complex deep model does avoid the overfitting problem. The experimental results demonstrate that using the new represented features in the classical model can obtain higher accuracy in time series problems.

international conference on data mining | 2012

A Weighted Support Vector Data Description Based on Rough Neighborhood Approximation

Yan-Xing Hu; James N. K. Liu; Yuan Wang; Lucas K. C. Lai

For a support vector algorithm, the problem of sensitivity to noise points is considered as one of the major problems that may affect the accuracy of the results. In this paper, a weighted method based on rough neighborhood approximation is proposed to reduce the influence of noise points for support vector data description algorithm, which is an important branch of support vector model. Based on the rough set theory, the element training set is divided into three regions, and the weight value is determined by the regions where a point is located. Experimental results showed that this proposed method can bring higher acceptance accuracy than that of classical support vector data description algorithm.

Applied Mathematics and Computation | 2012

Optimal bandwidth selection for re-substitution entropy estimation

James N. K. Liu; Xi-Zhao Wang; Yan-Xing Hu

Abstract A new fusion approach of selecting an optimal bandwidth for re-substitution entropy estimator (RE) is presented in this study. When approximating the continuous entropy with density estimation, two types of errors will be generated: entropy estimation error (type-I error) and density estimation error (type-II error). These two errors are all strongly dependent on the undetermined bandwidths. Firstly, an experimental conclusion based on 24 typical probability distributions is demonstrated that there is some inconsistency between the optimal bandwidths associated with these two errors. Secondly, two different error measures for type-I and type-II errors are derived. A trade-off between type-I and type-II errors is a fundamental and potential property of our proposed method called RE I + II . Thus, the fusion of these two errors is conducted and an optimal bandwidth for RE I + II is solved. Finally, the experimental comparisons are carried out to verify the estimation performance of our proposed strategy. The discretization method is deemed to be the necessary preprocessing technology for the calculation of continuous entropy traditionally. So, the nine mostly used unsupervised discretization methods are introduced to give comparison of their computational performances with that of RE I + II . And, five most popular estimators for entropy approximation are also plugged into our comparisons: splitting data estimator (SDE), cross-validation estimator (CVE), m-spacing estimator (mSE), m n -spacing estimator (mnSE), and nearest neighbor distance estimator (NNDE). The simulation studies on 24 different typical density distributions show that RE I + II can obtain the better estimation performance among the involved methods. Meanwhile, the estimation behaviors of different entropy estimation methods are also revealed based on the comparative results. The empirical analysis demonstrates that RE I + II is more insensitive to data and a better generalizable way for the estimation of continuous entropy. RE I + II makes it possible for a handy optimal bandwidth to be derived from a given dataset.

international conference on machine learning and cybernetics | 2011

A comparative study among different kernel functions in flexible naïve Bayesian classification

James N. K. Liu; Xi-Zhao Wang; Yan-Xing Hu

When determining the class of the unknown example by using naïve Bayesian classifier, we need to estimate the class conditional probabilities for the continuous attributes. In flexible Bayesian classifier, the Gaussian kernel function is frequently used for classification task under the framework of Parzen window method. In this paper, the other six kernel functions (uniform, triangular, epanechnikov, biweight, triweight and cosine) are introduced in the flexible naïve Bayesian. The performances of these seven kernels are compared in 30 UCI datasets. The experimental comparisons are carried out according to the following three aspects: the classification accuracy, ranking performance and the class probability estimation. The latter two are measured by the area under the ROC curve (AUC) and the conditional log likelihood (CLL). The related kernels are compared via two-tailed t-test with a 95 percent confidence level and the Friedmans test using the 0.05 critical level. The experimental results show that the most commonly used Gaussian kernel can not achieve the best classification accuracy and AUC. However, on the CLL, the Gaussian kernel is statistically significantly better than the other six kernels. Finally, the corresponding analyses are given based on the experimental results.

acm symposium on applied computing | 2014

A hybrid algorithm for recommendation twitter peers

James N. K. Liu; Zongnong Meng; Yan-Xing Hu; Simon C. K. Shiu; Vincent Cho

This paper presents a hybrid algorithm in the area of peer recommendation in Twitter. Due to the big data issue on Twitter, we define a filtering strategy to reduce the number of candidates who might be recommended to the target user. Meanwhile, we refine the content-based similarity and graph-based similarity algorithms proposed by other academics. Moreover, we define a user model and weighting formula to leverage these two algorithms. According to the similarity degree between the candidates and the target user, we recommend the top k most similar candidates to the target user as our focused peers. In order to evaluate the effectiveness of our proposed algorithms and other algorithms, we conduct a personalized survey and employ measurements like recall, precision and F1 metric. The evaluation results demonstrate that our hybrid algorithm is better than the pure content-based similarity algorithm and pure graph-based similarity algorithm.

Archive | 2012

Sensitivity and Generalization of SVM with Weighted and Reduced Features

Yan-Xing Hu; James N. K. Liu; Liwei Jia

Support Vector Machine, as a modern statistical learning method based on the principle of structure risk minimization rather than the empirical risk minimization, has been widely applied to the small-sample, non-linear and high-dimensional problems. Many new versions of SVM have been proposed to improve the performance SVM. Some of the new versions focus on processing the features of SVM. For example, give the features weight values or reduce some unnecessary features. A new feature weighted SVM and a feature reduced SVM are proposed in this chapter. The two versions of SVM are applied to the regression works to predict the price of a certain stock, and the outputs are compared with classical SVM. The results showed that the proposed feature weighted SVM can improve the accuracy of the regression, and the proposed featured reduced SVM is sensitive to the data sample for testing

international conference information processing | 2012

Naive Bayesian Classifier Based on Neighborhood Probability

James N. K. Liu; Xi-Zhao Wang; Yan-Xing Hu

When calculating the class-conditional probability of continuous attributes with naive Bayesian classifier (NBC) algorithm, the existing methods usually make use of the superposition of many normal distribution probability density functions to fit the true probability density function. Accordingly, the value of the class-conditional probability is equal to the sum of values of normal distribution probability density functions. In this paper, we propose a NPNBC model, i.e. the naive Bayesian classifier based on the neighborhood probability. In NPNBC, when calculating the class-conditional probability for a continuous attribute value in the given unknown example, a small neighborhood is created for the continuous attribute value in every normal distribution probability density function. So, the neighborhood probabilities for each normal distribution probability density function can be obtained. The sum of these neighborhood probabilities is the class-conditional probability for the continuous attribute value in NPNBC. Our experimental results demonstrate that NPNBC can obtain the remarkable performance in classification accuracy when compared with the normal method and the kernel method. In addition, we also investigate the relationship between the classification accuracy of NPNBC and the value of neighborhood.

Explore More