Is this you? Create Your Porfile

Leo Breiman

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leo Breiman is active.

Explore More

Publication

Featured researches published by Leo Breiman.

Machine Learning archive | 2001

Random Forests

Leo Breiman

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

Journal of the American Statistical Association | 1985

Estimating Optimal Transformations for Multiple Regression and Correlation.

Leo Breiman; Jerome H. Friedman

Abstract In regression analysis the response variable Y and the predictor variables X 1 …, Xp are often replaced by functions θ(Y) and O1(X 1), …, O p (Xp ). We discuss a procedure for estimating those functions θ and O1, …, O p that minimize e 2 = E{[θ(Y) — Σ O j (Xj )]2}/var[θ(Y)], given only a sample {(yk , xk1 , …, xkp ), 1 ⩽ k ⩽ N} and making minimal assumptions concerning the data distribution or the form of the solution functions. For the bivariate case, p = 1, θ and O satisfy ρ = p(θ, O) = maxθ,Oρ[θ(Y), O(X)], where ρ is the product moment correlation coefficient and ρ is the maximal correlation between X and Y. Our procedure thus also provides a method for estimating the maximal correlation between two variables.

Technometrics | 1995

Better subset regression using the nonnegative garrote

Leo Breiman

A new method, called the nonnegative (nn) garrote, is proposed for doing subset regression. It both shrinks and zeroes coefficients. In tests on real and simulated data, it produces lower prediction error than ordinary subset selection. It is also compared to ridge regression. If the regression equations generated by a procedure do not change drastically with small changes in the data, the procedure is called stable. Subset selection is unstable, ridge is very stable, and the nn-garrote is intermediate. Simulation results illustrate the effects of instability on prediction error.

Neural Computation | 1999

Prediction games and arcing algorithms

Leo Breiman

The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax theorem is an essential ingredient of the convergence proofs. An arcing algorithm is described that converges to the optimal strategy. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Schapire, Freund, Bartlett, and Lee (1997) offered an explanation of why Adaboost works in terms of its ability to produce generally high margins. The empirical comparison of Adaboost to the optimal arcing algorithm shows that their explanation is not complete.

IEEE Transactions on Information Theory | 1993

Hinging hyperplanes for regression, classification, and function approximation

Leo Breiman

A hinge function y=h(x) consists of two hyperplanes continuously joined together at a hinge. In regression (prediction), classification (pattern recognition), and noiseless function approximation, use of sums of hinge functions gives a powerful and efficient alternative to neural networks with computation times several orders of magnitude less than is obtained by fitting neural networks with a comparable number of parameters. A simple and effective method for finding good hinges is presented. >

Journal of The Royal Statistical Society Series B-statistical Methodology | 1997

Predicting Multivariate Responses in Multiple Linear Regression

Leo Breiman; Jerome H. Friedman

We look at the problem of predicting several response variables from the same set of explanatory variables. The question is how to take advantage of correlations between the response variables to improve predictive accuracy compared with the usual procedure of doing individual regressions of each response variable on the common set of predictor variables. A new procedure is introduced called the curds and whey method. Its use can substantially reduce prediction errors when there are correlations between responses while maintaining accuracy even if the responses are uncorrelated. In extensive simulations, the new procedure is compared with several previously proposed methods for predicting multiple responses (including partial least squares) and exhibits superior accuracy. One version can be easily implemented in the context of standard statistical packages.

Journal of the American Statistical Association | 1992

The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error

Leo Breiman

Abstract When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor variables. Usually, a regression equation based on a few variables will be more accurate and certainly simpler. There are various methods for picking “good” subsets of variables, and programs that do such procedures are part of every widely used statistical package. The most common methods are based on stepwise addition or deletion of variables and on “best subsets.” The latter refers to a search method that, given the number of variables to be in the equation (say, five), locates that regression equation based on five variables that has the lowest residual sum of squares among all five variable equations. All of these procedures generate a sequence of regression equations, the first based on one variable, the next based on two variables, and so on. Each member of this sequence is called a submodel and the number of variables in the e...

Machine Learning | 1999

Pasting Small Votes for Classification in Large Databases and On-Line

Leo Breiman

Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to on-line learning.

Machine Learning | 2000

Randomizing Outputs to Increase Prediction Accuracy

Leo Breiman

Bagging and boosting reduce error by changing both the inputs and outputs to form perturbed training sets, growing predictors on these perturbed training sets and combining them. An interesting question is whether it is possible to get comparable performance by perturbing the outputs alone. Two methods of randomizing outputs are experimented with. One is called output smearing and the other output flipping. Both are shown to consistently do better than bagging.

Machine Learning | 1996

Technical note: some properties of splitting criteria

Leo Breiman

Various criteria have been proposed for deciding which split is best at a given node of a binary classification tree. Consider the question: given a goodness-of-split criterion and the class populations of the instances at a node, what distribution of the instances between the two children nodes maximizes the goodness-of-split criterion? The answers reveal an interesting distinction between the gini and entropy criterion.

Explore More