Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David C. Hoaglin is active.

Publication


Featured researches published by David C. Hoaglin.


The American Statistician | 1978

The Hat Matrix in Regression and ANOVA

David C. Hoaglin; Roy E. Welsch

Abstract In least-squares fitting it is important to understand the influence which a data y value will have on each fitted y value. A projection matrix known as the hat matrix contains this information and, together with the Studentized residuals, provides a means of identifying exceptional data points. This approach also simplifies the calculations involved in removing a data point, and it requires only simple modifications in the preferred numerical least-squares algorithms.


Journal of the American Statistical Association | 1987

Fine-Tuning Some Resistant Rules for Outlier Labeling

David C. Hoaglin; Boris Iglewicz

Abstract A previous study examined the performance of a standard rule from Exploratory Data Analysis, which uses the sample fourths, FL and FU , and labels as “outside” any observations below FL – k(FU – FL ) or above FU + k(FU – FL ), customarily with k = 1.5. In terms of the order statistics X (1) ≤ X (2) ≤ X (n) the standard definition of the fourths is FL = X(f) and FU = X (n + 1 − f), where f = ½[(n + 3)/2] and [·] denotes the greatest-integer function. The results of that study suggest that finer interpolation for the fourths might yield smoother behavior in the face of varying sample size. In this article we show that using f i = n/4 + (5/12) to define the fourths produces the desired smoothness. Corresponding to a common definition of quartiles, fQ = n/4 + (1/4) leads to similar results. Instead of allowing the some-outside rate per sample (the probability that a sample contains one or more outside observations, analogous to the experimentwise error rate in simultaneous inference) to vary, some us...


Journal of the American Statistical Association | 1986

Performance of Some Resistant Rules for Outlier Labeling

David C. Hoaglin; Boris Iglewicz; John W. Tukey

Abstract The techniques of exploratory data analysis include a resistant rule for identifying possible outliers in univariate data. Using the lower and upper fourths, FL and FU (approximate quartiles), it labels as “outside” any observations below FL − 1.5(FU — FL ) or above FU + 1.5(FU — FL ). For example, in the ordered sample −5, −2, 0, 1, 8, FL = −2 and FU = 1, so any observation below −6.5 or above 5.5 is outside. Thus the rule labels 8 as outside. Some related rules also use cutoffs of the form FL — k(FU — FL ) and FU + k(FU — FL ). This approach avoids the need to specify the number of possible outliers in advance; as long as they are not too numerous, any outliers do not affect the location of the cutoffs. To describe the performance of these rules, we define the some-outside rate per sample as the probability that a sample will contain one or more outside observations. Its complement is the all-inside rate per sample. We also define the outside rate per observation as the average fraction of outs...


The American Statistician | 1989

Some Implementations of the Boxplot

Michael Frigge; David C. Hoaglin; Boris Iglewicz

Abstract An increasing number of statistical software packages offer exploratory data displays and summaries. For one of these, the graphical technique known as the boxplot, a selective survey of popular software packages revealed several definitions. These alternative constructions arise from different choices in computing quartiles and the fences that determine whether an observation is “outside” and thus plotted individually. We examine these alternatives and their consequences, discuss related background for boxplots (such as the probability that a sample contains one or more outside observations and the average proportion of outside observations in a sample), and offer recommendations that lead to a single standard form of the boxplot.


American Journal of Preventive Medicine | 2001

Overview of the sampling design and statistical methods used in the National Immunization Survey22

Philip J. Smith; Michael P. Battaglia; Vicki J Huggins; David C. Hoaglin; Ann-Sofi Rodén; Meena Khare; Trena M Ezzati-Rice; Robert A Wright

Abstract: The National Immunization Survey (NIS) is a large federally funded survey designed to estimate vaccination coverage rates for children residing in the United States aged 19 to 35 months. In 1999, over 8 million telephone call attempts were made to obtain provider-reported vaccination histories on 22,521 children in the age range of interest.


American Journal of Psychology | 1991

Fundamentals of Exploratory Analysis of Variance

Seth Roberts; David C. Hoaglin; Frederick Mosteller; John W. Tukey

Concepts and Examples in Analysis of Variance (J. Tukey, et al.) Purposes of Analyzing Data that Come in a Form Inviting Us to Apply Tools from the Analysis of Variance (F. Mosteller & J. Tukey) Preliminary Examination of Data (F. Mosteller & D. Hoaglin) Types of Factors and Their Structural Layouts (J. Singer) Value-Splitting: Taking the Data Apart (C. Schmid) Value-Splitting Involving More Factors (K. Halvorsen) Mean Squares, F Tests, and Estimates of Variance (F. Mosteller, et al.) Graphical Display as an Aid to Analysis (J. Emerson) Components of Variance (C. Brown & F. Mosteller) Which Denominator? (T. Blackwell, et al.) Assessing Changes (J. Tukey, et al.) Qualitative and Quantitative Confidence (J. Tukey & D. Hoaglin) Introduction to Transformation (J. Emerson) Appendix Index.


The American Statistician | 1980

A Poissonness Plot

David C. Hoaglin

Abstract A graphical technique, similar in spirit to probability plotting, can be used to judge whether a Poisson model is appropriate for an observed frequency distribution. This “Poissonness plot” can equally be applied to truncated Poisson situations. It provides a type of robustness for detecting isolated discrepancies in otherwise well-behaved frequency distributions.


The American Statistician | 1995

A Critical Look at Some Analyses of Major League Baseball Salaries

David C. Hoaglin; Paul F. Velleman

Abstract At a data analysis exposition sponsored by the Section on Statistical Graphics of the ASA in 1988, 15 groups of statisticians analyzed the same data about salaries of major league baseball players. By examining what they did, what worked, and what failed, we can begin to learn about the relative strengths and weaknesses of different approaches to analyzing data. The data are rich in difficulties. They require reexpression, contain errors and outliers, and exhibit nonlinear relationships. They thus pose a realistic challenge to the variety of data analysis techniques used. The analysis groups chose a wide range of model-fitting methods, including regression, principal components, factor analysis, time series, and CART. We thus have an effective framework for comparing these approaches so that we can learn more about them. Our examination shows that approaches commonly identified with Exploratory Data Analysis are substantially more effective at revealing the underlying patterns in the data and at ...


Journal of Quality Technology | 1987

Use of Boxplots for Process Evaluation

Boris Iglewicz; David C. Hoaglin

The use of boxplots in place of single points in a quality control chart can provide an effective display of the information usually given in X-bar and R charts, show the degree of compliance with specifications and identify outliers. An example from a ..


Journal of the American Statistical Association | 1984

Leverage in Least Squares Additive-Plus-Multiplicative Fits for Two-Way Tables

John D. Emerson; David C. Hoaglin; Peter J. Kempthorne

Abstract An additive-plus-multiplicative model can describe both main effects and row x column interactions in two-way tables of data. When each cell contains exactly one observation, a least squares fit for this nonlinear model calculates the main effects, using means of rows and columns, and then fits a multiplicative term to the additive residuals, using the singular value decomposition. A natural extension of the hat matrix for a linear model yields a definition of leverage that provides insights about the impact of erroneous data values on the fit. Theoretical and numerical investigations reveal the complex nature of leverage for this nonlinear model.

Collaboration


Dive into the David C. Hoaglin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin R. Frankel

City University of New York

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Trena M Ezzati-Rice

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Judith M. Tanur

State University of New York System

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Philip J. Smith

Centers for Disease Control and Prevention

View shared research outputs
Researchain Logo
Decentralizing Knowledge