Houkuan Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Houkuan Huang is active.

Explore More

Publication

Featured researches published by Houkuan Huang.

Expert Systems With Applications | 2009

Feature selection for text classification with Naïve Bayes

Jingnian Chen; Houkuan Huang; Shengfeng Tian; Youli Qu

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

Expert Systems With Applications | 2007

A novel feature selection algorithm for text categorization

Wenqian Shang; Houkuan Huang; Haibin Zhu; Yongmin Lin; Youli Qu; Zhihai Wang

With the development of the web, large numbers of documents are available on the Internet. Digital libraries, news sources and inner data of companies surge more and more. Automatic text categorization becomes more and more important for dealing with massive data. However the major problem of text categorization is the high dimensionality of the feature space. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present another method of dealing with text feature selection. Our study is based on Gini index theory and we design a novel Gini index algorithm to reduce the high dimensionality of the feature space. A new measure function of Gini index is constructed and made to fit text categorization. The results of experiments show that our improvements of Gini index behave better than other methods of feature selection.

Computers & Operations Research | 2009

An iterated local search algorithm for the permutation flowshop problem with total flowtime criterion

Xingye Dong; Houkuan Huang; Ping Chen

An ILS algorithm is proposed to solve the permutation flowshop sequencing problem with total flowtime criterion. The effects of different initial permutations and different perturbation strengths are studied. Comparisons are carried out with three constructive heuristics, three ant-colony algorithms and a particle swarm optimization algorithm. Experiments on benchmarks and a set of random instances show that the proposed algorithm is more effective. The presented ILS improves the best known permutations by a significant margin.

Information Sciences | 2007

Evolutionary programming using a mixed mutation strategy

Hongbin Dong; Jun He; Houkuan Huang; Wei Hou

Abstract Different mutation operators have been proposed in evolutionary programming, but for each operator there are some types of optimization problems that cannot be solved efficiently. A mixed strategy, integrating several mutation operators into a single algorithm, can overcome this problem. Inspired by evolutionary game theory, this paper presents a mixed strategy evolutionary programming algorithm that employs the Gaussian, Cauchy, Levy, and single-point mutation operators. The novel algorithm is tested on a set of 22 benchmark problems. The results show that the mixed strategy performs equally well or better than the best of the four pure strategies does, for all of the benchmark problems.

Expert Systems With Applications | 2010

Iterated variable neighborhood descent algorithm for the capacitated vehicle routing problem

Ping Chen; Houkuan Huang; Xingye Dong

The capacitated vehicle routing problem (CVRP) aims to determine the minimum total cost routes for a fleet of homogeneous vehicles to serve a set of customers. A wide spectrum of applications outlines the relevance of this problem. In this paper, a hybrid heuristic method IVND with variable neighborhood descent based on multi-operator optimization is proposed for solving the CVRP. A perturbation strategy has been designed by cross-exchange operator to help optimization escape from local minima. The performance of our algorithm has been tested on 34 CVRP benchmark problems and it shows that the proposed IVND performs well and is quite competitive with other state-of-the-art heuristics. Additionally, the proposed IVND is flexible and problem dependent, as well as easy to implement.

Computers & Operations Research | 2008

An improved NEH-based heuristic for the permutation flowshop problem

Xingye Dong; Houkuan Huang; Ping Chen

NEH is an effective heuristic for solving the permutation flowshop problem with the objective of makespan. It includes two phases: generate an initial sequence and then construct a solution. The initial sequence is studied and a strategy is proposed to solve job insertion ties which may arise in the construct process. The initial sequence which is generated by combining the average processing time of jobs and their standard deviations shows better performance. The proposed strategy is based on the idea of balancing the utilization among all machines. Experiments show that using this strategy can improve the performance of NEH significantly. Based on the above ideas, a heuristic NEH-D (NEH based on Deviation) is proposed, whose time complexity is O(mn^2), the same as that of NEH. Computational results on benchmarks show that the NEH-D is significantly better than the original NEH.

international conference on machine learning and cybernetics | 2003

Improving one-class SVM for anomaly detection

Kun-Lun Li; Houkuan Huang; Sheng-Feng Tian; Wei Xu

With the tremendous growth of the Internet, information system security has become an issue of serious global concern due to the rapid connection and accessibility. Developing effective methods for intrusion detection, therefore, is an urgent task for assuring computer & information system security. Since most attacks and misuses can be recognized through the examination of system audit log files and pattern analysis therein, an approach for intrusion detection can be built on them. First we have made deep analysis on attacks and misuses patterns in log files; and then proposed an approach using support vector machines for anomaly detection. It is a one-class SVM based approach, trained with abstracted user audit logs data from 1999 DARPA.

european symposium on research in computer security | 2008

Online Risk Assessment of Intrusion Scenarios Using D-S Evidence Theory

C. P. Mu; Xiangjun Li; Houkuan Huang; Shengfeng Tian

In the paper, an online risk assessment model based on D-S evidence theory is presented. The model can quantitate the risk caused by an intrusion scenario in real time and provide an objective evaluation of the target security state. The results of the online risk assessment show a clear and concise picture of both the intrusion progress and the target security state. The model makes full use of available information from both IDS alerts and protected targets. As a result, it can deal with uncertainties and subjectiveness very well in its evaluation process. In IDAM&IRS, the model serves as the foundation for intrusion response decision-making.

Computers & Operations Research | 2013

A multi-restart iterated local search algorithm for the permutation flow shop problem minimizing total flow time

Xingye Dong; Ping Chen; Houkuan Huang; Maciek Nowak

A variety of metaheuristics have been developed to solve the permutation flow shop problem minimizing total flow time. Iterated local search (ILS) is a simple but powerful metaheuristic used to solve this problem. Fundamentally, ILS is a procedure that needs to be restarted from another solution when it is trapped in a local optimum. A new solution is often generated by only slightly perturbing the best known solution, narrowing the search space and leading to a stagnant state. In this paper, a strategy is proposed to allow the restart solution to be generated from a group of solutions drawn from local optima. This allows an extension of the search space, while maintaining the quality of the restart solution. A multi-restart ILS (MRSILS) is proposed, with the performance evaluated on a set of benchmark instances and compared with six state of the art metaheuristics. The results show that the easily implementable MRSILS is significantly better than five of the other metaheuristics and comparable to or slightly better than the remaining one.

Knowledge Based Systems | 2008

A selective Bayes Classifier for classifying incomplete data based on gain ratio

Jingnian Chen; Houkuan Huang; Fengzhan Tian; Shengfeng Tian

Actual data sets are often incomplete because of various kinds of reasons. Although numerous algorithms about classification have been proposed, most of them deal with complete data. So methods of constructing classifiers for incomplete data deserve more attention. By analyzing main methods of processing incomplete data for classification, this paper presents a selective Bayes Classifier for classifying incomplete data with a simpler formula for computing gain ratio. The proposed algorithm needs no assumption about data sets that are necessary for previous methods of processing incomplete data in classification. Experiments on 12 benchmark incomplete data sets show that this method can greatly improve the accuracy of classification. Furthermore, it can sharply reduce the number of attributes and so can greatly simplify the data sets and classifiers.

Explore More