Feng Honghai
University of Science and Technology Beijing
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Feng Honghai.
international conference on knowledge based and intelligent information and engineering systems | 2005
Feng Honghai; Chen Guoshun; Yin Cheng; Yang Bing-ru; Chen Yumei
In KDD procedure, to fill in missing data typically requires a very large investment of time and energy – often 80% to 90% of a data analysis project is spent in making the data reliable enough so that the results can be trustful. In this paper, we propose a SVM regression based algorithm for filling in missing data, i.e. set the decision attribute (output attribute) as the condition attribute (input attribute) and the condition attribute as the decision attribute, then use SVM regression to predict the condition attribute values. SARS data set experimental results show that SVM regression method has the highest precision. The method with which the value of the example that has the minimum distance to the example with missing value will be taken to fill in the missing values takes the second place, and the mean and median methods have lower precision.
international conference of the ieee engineering in medicine and biology society | 2005
Feng Honghai; Chen Guoshun; Wang Yufeng; Yang Bingru; Chen Yumei
SARS is an acute infectious disease and can cause a large amount of death. Up until now we have not known it well. With the experimental results of micronutrients of 30 SARS patients and 30 non-SARS patients, using rough set theory we induce some classification rules. Attribute reduction results show that micronutrients Fe, Ca, K and Na are necessary and sufficient for classification, whereas micronutrients Zn, Cu and Mg are not necessary or are redundant. Additionally, we find that micronutrient Ca has a strong correlation to SARS. The classification results of 30 other examples show that the rough set classification method is available
international conference on knowledge based and intelligent information and engineering systems | 2005
Feng Honghai; Liu Baoyan; Yin Cheng; Li Ping; Yang Bing-ru; Chen Yumei
For SVM classifier, Pre-selecting data is necessary to achieve satisfactory classification rate and reduction of complexity. According to Rough Set Theory, the examples in boundary region of a set belong to two or more classes, lying in the boundary of the classes, and according to SVM, support vectors lie in the boundary too. So we use Rough Set Theory to select the examples of boundary region of a set as the SVM classifier set, the complexity of SVM classifier would be reduced and the accuracy maintained. Experiment results of SARS data indicate that our schema is available in both the training and prediction stages.
international conference on knowledge based and intelligent information and engineering systems | 2006
Feng Honghai; Liu Baoyan; He Liyun; Yang Bing-ru; Chen Yumei; Zhao Shuo
Rough set based rule generation methods need discretization of the continuous values. However, most existing discretization methods cause inconsistencies. In this paper, we propose an algorithm that can eliminate the inconsistencies caused during the course of discretization. The algorithm can be integrated into the discaretization algorithms that cannot avoid causing inconsistencies to eliminate the inconsistencies. Three data experimental results show that the algorithm is available.
international conference on knowledge based and intelligent information and engineering systems | 2006
Feng Honghai; Liu Baoyan; He Liyun; Yang Bing-ru; Li Yueli
Most of the existing discretization methods such as k-interval discretization, equal width and equal frequency methods do not take the dependencies of decision attributes on condition attributes into account. In this paper, we propose a discretization algorithm that can keep the dependencies of the decision attribute on condition attributes, or keep the positive regions of the partition of the decision attribute. In the course of inducing classification rules from a data set, keeping these dependencies can achieve getting the set of the least condition attributes and the highest classification precision.
international conference on knowledge based and intelligent information and engineering systems | 2006
Feng Honghai; Liu Baoyan; He Liyun; Yang Bing-ru; Li Yueli; Zhao Shuo
In fault diagnosis and medical diagnosis fields, often there is more than one fault or disease that occur together. In order to obtain the factors that cause a single fault to change to multi-faults, the standard rough set based methods should be rebuilt. In this paper, we propose a decernibilty matrix based algorithm with which the cause of every single fault to change to multi-faults can be revealed. Additionally, we propose another rough set based algorithm to induce the common causes of all the single faults to change to their corresponding multi-faults, which is a process of knowledge discovery in rule base, i.e., not the usual database. Inducing more abstract rules in knowledge base is a very challenging problem that has not been resolved well.
international conference on knowledge based and intelligent information and engineering systems | 2006
Feng Honghai; Xu Hao; Liu Baoyan; He Liyun; Yang Bing-ru; Li Yueli
In a real world data set there are usually four kinds of mistaken values, the first one is the mistake in unit; the second one is the mistake of putting the radix points in wrong place, the third one is a scribal error, and the fourth one is a computational mistake. In this paper, we propose two algorithms for finding these four kinds of mistaken data. SARS and coronary heart disease data sets experimental results show that the two algorithms are available, that is, using the two algorithms we find some mistakes in the SARS and coronary heart disease data sets, and the results correspond to that found manually by medical experts.
intelligent data engineering and automated learning | 2006
Feng Honghai; Xu Hao; Liu Baoyan; Yang Bing-ru; Gao Zhuye; Li Yueli
In real world, there are a lot of knowledge such as the following: most human beings that are infected by a kind of virus suffer from a corresponding disease, but a small number human beings do not. Which are the factors that negate the effects of the virus? Standard rough set method can induce simplified rules for classification, but cannot generate this kind of knowledge directly. In this paper, we propose two algorithms to find the factors. In the first algorithm, the typical rough set method is used to generate all the variable precision rules firstly; secondly reduce attributes and generate all the non-variable precision rules; lastly compare the variable precision rules and non-variable precision rules to generate the factors that negate the variable precision rules. In the second algorithm, firstly, induce all the variable precision rules; secondly, select the examples corresponding to the variable precision rules to build decernibility matrixes; thirdly, generate the factors that negate the variable precision rules. Three experimental results show that using the two algorithms can get the same results and the computational complexity of the second algorithm is largely less than the firs one.
industrial and engineering applications of artificial intelligence and expert systems | 2006
Feng Honghai; Zhao Shuo; Liu Baoyan; He Liyun; Yang Bing-ru; Li Yueli
Rough set discernibility matrix method is a valid method to attribute reduction. However, it is a NP-hard problem. Up until now, though some methods have been proposed to improve this problem, the case is not improved well. We find that the idea of discernibility matrix can be used to not only the whole data but also partial data. So we present a new algorithm to reduce the computational complexity. Firstly, select a condition attribute C that holds the largest measure of γ(C, D) in which the decision attribute D depends on C. Secondly, with the examples in the non-positive region, build a discernibility matrix to create attribute reduction. Thirdly, combine the attributes generated in the above two steps into the attribute reduction set. Additionally, we give a proof of the rationality of our method. The larger the positive region is; the more the complexity is reduced. Four Experimental results indicate that the computational complexity is reduced by 67%, 83%, 41%, and 30% respectively and the reduced attribute sets are the same as the standard discernibility matrix method.
international conference on knowledge capture | 2005
Feng Honghai; Liu Baoyan; He Liyun
Because of the amount of missing values in our SARS data set is very large, to fill in them wholly with the existing methods is impossible or the results of being filled in are not reliable. Only taking two attributes into account can avoid using the large amount of missing values, which is the feature of rough set that other machine learning method cannot hold. In this paper, we induced some rules based on rough set from the SARS data set that have not been detected by medical experts in clinic practice.Because of the amount of missing values in our SARS data set is very large, to fill in them wholly with the existing methods is impossible or the results of being filled in are not reliable. Only taking two attributes into account can avoid using the large amount of missing values, which is the feature of rough set that other machine learning method cannot hold. In this paper, we induced some rules based on rough set from the SARS data set that have not been detected by medical experts in clinic practice.