J. Brian Gray | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where J. Brian Gray is active.

Explore More

Publication

Featured researches published by J. Brian Gray.

Computational Statistics & Data Analysis | 2008

Classification tree analysis using TARGET

J. Brian Gray; Guangzhe Fan

Tree models are valuable tools for predictive modeling and data mining. Traditional tree-growing methodologies such as CART are known to suffer from problems including greediness, instability, and bias in split rule selection. Alternative tree methods, including Bayesian CART (Chipman et al., 1998; Denison et al., 1998), random forests (Breiman, 2001a), bootstrap bumping (Tibshirani and Knight, 1999), QUEST (Loh and Shih, 1997), and CRUISE (Kim and Loh, 2001), have been proposed to resolve these issues from various aspects, but each has its own drawbacks. Gray and Fan (2003) described a genetic algorithm approach to constructing decision trees called tree analysis with randomly generated and evolved trees (TARGET) that performs a better search of the tree model space and largely resolves the problems with current tree modeling techniques. Utilizing the Bayesian information criterion (BIC), Fan and Gray (2005) developed a version of TARGET for regression tree analysis. In this article, we consider the construction of classification trees using TARGET. We modify the BIC to handle a categorical response variable, but we also adjust its penalty component to better account for the model complexity of TARGET. We also incorporate the option of splitting rules based on linear combinations of two or three variables in TARGET, which greatly improves the prediction accuracy of TARGET trees. Comparisons of TARGET to existing methods, using simulated and real data sets, indicate that TARGET has advantages over these other approaches.

Journal of Computational and Graphical Statistics | 2005

Regression Tree Analysis Using TARGET

Guangzhe Fan; J. Brian Gray

Regression trees are a popular alternative to classical regression methods. A number of approaches exist for constructing regression trees. Most of these techniques, including CART, are sequential in nature and locally optimal at each node split, so the final tree solution found may not be the best tree overall. In addition, small changes in the training data often lead to large changes in the final result due to the relative instability of these greedy tree-growing algorithms. Ensemble techniques, such as random forests, attempt to take advantage of this instability by growing a forest of trees from the data and averaging their predictions. The predictive performance is improved, but the simplicity of a single-tree solution is lost. In earlier work, we introduced the Tree Analysis with Randomly Generated and Evolved Trees (TARGET) method for constructing classification trees via genetic algorithms. In this article, we extend the TARGET approach to regression trees. Simulated data and real world data are used to illustrate the TARGET process and compare its performance to CART, Bayesian CART, and random forests. The empirical results indicate that TARGET regression trees have better predictive performance than recursive partitioning methods, such as CART, and single-tree stochastic search methods, such as Bayesian CART. The predictive performance of TARGET is slightly worse than that of ensemble methods, such as random forests, but the TARGET solutions are far more interpretable.

The Journal of Portfolio Management | 2004

History of the Forecasters

Robert Brooks; J. Brian Gray

An analysis of semiannual Wall Street Journal long-term interest rate forecasts made since 1982 by a panel of distinguished economic experts shows that the consensus forecast of long-term U.S. Treasury bond yield changes is poor. A naive forecast is more accurate.

The American Statistician | 1997

Elemental Subsets: The Building Blocks of Regression

Matthew S. Mayo; J. Brian Gray

Abstract In a regression dataset an elemental subset consists of the minimum number of cases required to estimate the unknown parameters of a regression model. The resulting elemental regression provides an exact fit to the cases in the elemental subset. Early methods of regression estimation were based on combining the results of elemental regressions. This approach was abandoned because of its computational infeasibility in all but the smallest datasets and because of the arrival of the least squares method. With the computing power available today, there has been renewed interest in making use of the elemental regressions for model fitting and diagnostic purposes. In this paper we consider the elemental subsets and their associated elemental regressions as useful “building blocks” for the estimation of regression models. Many existing estimators can be expressed in terms of the elemental regressions. We introduce a new classification of regression estimators that generalizes a characterization of ordin...

Computational Statistics & Data Analysis | 1997

Leverage, residual, and interaction diagnostics for subsets of cases in least squares regression

Bruce E. Barrett; J. Brian Gray

Abstract Leverage and residual values are useful general diagnostics in least squares regression because all single-case influence measures are functions of these two basic components. Recent work in the area of robust diagnostics has suggested that ordinary leverage and residual values can be ineffective in the presence of “masking” and other multiple case effects, but Kempthorne and Mendel (1990) and others have pointed out that satisfactory definitions of “leverage” and “residual” for subsets of cases might overcome these problems. In this article, we propose a set of three simple, yet general and comprehensive, subset diagnostics (referred to as leverage, residual, and interaction) that have the desirable characteristics of single-case leverage and residual diagnostics. Most importantly, the proposed measures are the basis of several existing subset influence measures, including Cooks distance. We illustrate how these basic diagnostics usefully complement existing multiple outlier detection procedures and subset influence measures in understanding the influence structure within a regression data set.

Statistics and Computing | 1994

A computational framework for variable selection in multivariate regression

Bruce E. Barrett; J. Brian Gray

Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression.

The American Statistician | 1994

The Maximum Size of Standardized and Internally Studentized Residuals in Regression Analysis

J. Brian Gray; William H. Woodall

Abstract Shiffler (1988) showed that the magnitude of the largest Z score in a univariate data set is bounded above by . Similar bounds hold for standardized and internally studentized residuals in regression analysis. The implications of these bounds for outlier identification in regression do not appear to be widely recognized. Many regression textbooks contain recommendations for residual analysis that are not appropriate in light of these results.

Journal of Computational and Graphical Statistics | 1992

Efficient Computation of Subset Influence in Regression

Bruce E. Barrett; J. Brian Gray

Abstract The detection of influential cases is now accepted as an essential component of regression diagnostics. It is also well established that two or more cases that are individually regarded as noninfluential may act in concert to achieve a high level of joint influence. However, for the majority of data sets it is computationally infeasible to calculate the influence for all subsets of a given size. In this article we address this problem and suggest an algorithm that greatly reduces the computational effort by making use of a sequence of upper bounds on the influence value. These upper bounds are much less costly to evaluate and greatly reduce the number of subsets for which the influence value must be explicitly determined.

Statistics and Computing | 1996

Computation of determinantal subset influence in regression

Bruce E. Barrett; J. Brian Gray

One of the important goals of regression diagnostics is the detection of cases or groups of cases which have an inordinate impact on the regression results. Such observations are generally described as ‘influential’. A number of influence measures have been proposed, each focusing on a different aspect of the regression. For single cases, these measures are relatively simple and inexpensive to calculate. However, the detection of multiple-case or joint influence is more difficult on two counts. First, calculation of influence for a single subset is more involved than for an individual case, and second, the sheer number of subsets of cases makes the computation overwhelming for all but the smallest data sets.Barrett and Gray (1992) described methods for efficiently examining subset influence for those measures that can be expressed as the trace of a product of positive semidefinite (psd) matrices. There are, however, other popular measures that do not take this form, but rather are expressible as the ratio of determinants of psd matrices. This article focuses on reducing the computation for the determinantal ratio measures by making use of upper and lower bounds on the influence to limit the number of subsets for which the actual influence must be explicitly determined.

Probabilistic Engineering Mechanics | 2001

The robustness and efficiency of trimmed elemental estimation in regression analysis: a Monte Carlo simulation study

Matthew S. Mayo; J. Brian Gray

Mayo and Gray [Am Statist 51 (1997) 122] introduced the leverage-residual weighted elemental (LRWE) classification of regression estimators and proposed a new method of estimation called trimmed elemental estimation (TEE). In this article, we perform a simulation study of the efficiency of certain TEE estimators relative to ordinary least squares under normal errors and their robustness under various non-normal error distributions in the context of the simple linear regression model. Comparisons among these estimators are made on the basis of mean square error and percentiles of the absolute estimation errors in the simulations.

Explore More