George H. John
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George H. John.
Artificial Intelligence | 1997
Ron Kohavi; George H. John
Abstract In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
international conference on tools with artificial intelligence | 1994
Ron Kohavi; George H. John; Richard Long; David Manley; Karl Pfleger
We present MLC++, a library of C++ classes and tools for supervised machine learning. While MLC++ provides general learning algorithms that can be used by end users, the main objective is to provide researchers and experts with a wide variety of tools that can accelerate algorithm development, increase software reliability, provide comparison tools, and display information visually. More than just a collection of existing algorithms, MLC++ is can attempt to extract commonalities of algorithms and decompose them for a unified view that is simple, coherent, and extensible. In this paper we discuss the problems MLC++ aims to solve, the design of MLC++, and the current functionality.<<ETX>>
Archive | 1998
Ron Kohavi; George H. John
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes. In addition, the feature subsets selected by the wrapper are significantly smaller than the original subsets used by the learning algorithms, thus producing more comprehensible models.
IEEE Intelligent Systems | 1996
George H. John; Peter Miller; Randy Kerber
High-quality financial databases have existed for many years, but human analysts can only scratch the surface of the wealth of knowledge buried in this data. Using the rule-induction technology in the Recon data-mining system, an investment strategy based purely on the learned rules can generate significant profits.
international conference on artificial intelligence and statistics | 1996
George H. John
We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called iterative re-faltering which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DT-SEPIR, which is compared with the CART and OC1 algorithms.
Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) | 1997
George H. John; Yin Zhao
The paper reports a preliminary investigation of the use of of modern data mining tools for mortgage scoring. Using IBMs Intelligent Miner (a data mining toolbox), the authors built a model of serious delinquency on a sample of data from Mortgage Information Corporations Loan Performance System, which contains over 20 million loans with a volume of over
IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr) | 1996
George H. John; Peter Miller
1.6 trillion. Currently, two technologies prevail in mortgage scoring: logistic regression, a very old and very simple method, and neural networks, newer and more complex types of models that can be extremely difficult to interpret. The radial basis function (RBF) algorithm in Intelligent Miner combines the mathematical complexity and generality of neural networks with a comprehensible visualization that explains the RBF model. Due to the performance and understandability of the RBF model, as well as other unique technologies not described, the Intelligent Miner should be a useful tool for mortgage bankers, facilitating development of customized systems for mortgage scoring and other mortgage banking applications.
international symposium on neural networks | 1995
George H. John
We approach stock selection for long/short portfolios from the perspective of knowledge discovery in databases and rule induction: given a database of historical information on some universe of stocks, discover rules from the data that will allow one to predict which stocks are likely to have exceptionally high or low returns in the future. Long/short portfolios allow a fund manager to independently address value-added stock selection and factor exposure, and are a popular tool in financial engineering. For stock selection we employed the Recon system, which is able to induce a set of rules to model the data it is given. We evaluate Recons stock selection performance by using it to build equitized long/short portfolios over eighteen quarters of historical data from October 1988 to March 1993, repeatedly using the previous four quarters of data to build a model which is then used to rank stocks in the current quarter. When trading costs were taken into account, Recons equitized long/short portfolio had a total return of 277%, significantly outperforming the benchmark (S&P500), which returned 92.5% over the same period. We conclude that rule induction is a valuable tool for stock selection.
uncertainty in artificial intelligence | 1995
George H. John; Pat Langley
Discusses the weight update rule in the cascade correlation neural net learning algorithm. The weight update rule implements gradient descent optimization of the correlation between a new hidden units output and the previous networks error. The author presents a derivation of the gradient of the correlation function and shows that his resulting weight update rule results in slightly faster training. The author also shows that the new rule is mathematically equivalent to the one presented in the original cascade correlation paper and discusses numerical issues underlying the difference in performance. Since a derivation of the cascade correlation weight update rule was not published, this paper should be useful to those who wish to understand the rule.
international conference on machine learning | 1994
George H. John; Ron Kohavi; Karl Pfleger