Jerome H. Friedman
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jerome H. Friedman.
ACM Transactions on Mathematical Software | 1977
Jerome H. Friedman; Jon Louis Bentley; Raphael A. Finkel
An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional to logN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.
Journal of the American Statistical Association | 1989
Jerome H. Friedman
Abstract Linear and quadratic discriminant analysis are considered in the small-sample, high-dimensional setting. Alternatives to the usual maximum likelihood (plug-in) estimates for the covariance matrices are proposed. These alternatives are characterized by two parameters, the values of which are customized to individual situations by jointly minimizing a sample-based estimate of future misclassification risk. Computationally fast implementations are presented, and the efficacy of the approach is examined through simulation studies and application to data. These studies indicate that in many circumstances dramatic gains in classification accuracy can be achieved.
Journal of the American Statistical Association | 1981
Jerome H. Friedman; Werner Stuetzle
Abstract A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor variables in an iterative manner. It is more general than standard stepwise and stagewise regression procedures, does not require the definition of a metric in the predictor space, and lends itself to graphical interpretation.
Technometrics | 1993
lldiko E. Frank; Jerome H. Friedman
Chemometrics is a field of chemistry that studies the application of statistical methods to chemical data analysis. In addition to borrowing many techniques from the statistics and engineering literatures, chemometrics itself has given rise to several new data-analytical methods. This article examines two methods commonly used in chemometrics for predictive modeling—partial least squares and principal components regression—from a statistical perspective. The goal is to try to understand their apparent successes and in what situations they can be expected to work well and to compare them with other statistical methods intended for those situations. These methods include ordinary least squares, variable subset selection, and ridge regression.
IEEE Transactions on Computers | 1974
Jerome H. Friedman; John W. Tukey
An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples. The algorithm seeks to find one-and two-dimensional linear projections of multivariate data that are relatively highly revealing.
The Annals of Applied Statistics | 2007
Jerome H. Friedman; Trevor Hastie; Holger Höfling; Robert Tibshirani
We consider “one-at-a-time” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the “fused lasso,” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems. 1. Introduction. In this paper we consider statistical models that lead to convex optimization problems with inequality constraints. Typically, the optimization for these problems is carried out using a standard quadratic programming algorithm. The purpose of this paper is to explore “one-at-a-time” coordinate-wise descent algorithms for these problems. The equivalent of a coordinate descent algorithm has been proposed for the L1-penalized regression (lasso) in the literature, but it is not commonly used. Moreover, coordinate-wise algorithms seem too simple, and they are not often used in convex optimization, perhaps because they only work in specialized problems. We ourselves never appreciated the value of coordinate descent methods for convex statistical problems before working on this paper. In this paper we show that coordinate descent is very competitive with the wellknown LARS (or homotopy) procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net. We then go on to explore a nonseparable problem in which coordinate-wise descent does not work—the “fused lasso.” We derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to
Journal of the American Statistical Association | 1985
Leo Breiman; Jerome H. Friedman
Abstract In regression analysis the response variable Y and the predictor variables X 1 …, Xp are often replaced by functions θ(Y) and O1(X 1), …, O p (Xp ). We discuss a procedure for estimating those functions θ and O1, …, O p that minimize e 2 = E{[θ(Y) — Σ O j (Xj )]2}/var[θ(Y)], given only a sample {(yk , xk1 , …, xkp ), 1 ⩽ k ⩽ N} and making minimal assumptions concerning the data distribution or the form of the solution functions. For the bivariate case, p = 1, θ and O satisfy ρ = p(θ, O) = maxθ,Oρ[θ(Y), O(X)], where ρ is the product moment correlation coefficient and ρ is the maximal correlation between X and Y. Our procedure thus also provides a method for estimating the maximal correlation between two variables.
Data Mining and Knowledge Discovery | 1997
Jerome H. Friedman
The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like “naive” Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why“bagging/aggregating” classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.
Journal of the American Statistical Association | 1987
Jerome H. Friedman
Abstract A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods. A number of practical issues concerning its application are addressed. A connection to multivariate density estimation is established, and its properties are investigated through simulation studies and application to real data. The goal of exploratory projection pursuit is to use the data to find low- (one-, two-, or three-) dimensional projections that provide the most revealing views of the full-dimensional data. With these views the human gift for pattern recognition can be applied to help discover effects that may not have been anticipated in advance. Since linear effects are directly captured by the covariance structure of the variable pairs (which are straightforward to estimate) the emphasis here is on the discovery of nonlinear effects such as clustering or other general nonlinear associations among the variables. Although arbitrary ...
Journal of Computational and Graphical Statistics | 2013
Noah Simon; Jerome H. Friedman; Trevor Hastie; Robert Tibshirani
For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparse effects both on a group and within group level, we introduce a regularized model for linear regression with ℓ1 and ℓ2 penalties. We discuss the sparsity and other regularization properties of the optimal fit for this model, and show that it has the desired effect of group-wise and within group sparsity. We propose an algorithm to fit the model via accelerated generalized gradient descent, and extend this model and algorithm to convex loss functions. We also demonstrate the efficacy of our model and the efficiency of our algorithm on simulated data. This article has online supplementary material.