Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ruben H. Zamar is active.

Publication


Featured researches published by Ruben H. Zamar.


Journal of the American Statistical Association | 1988

High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale.

Victor J. Yohai; Ruben H. Zamar

Abstract A new class of robust estimates, τ estimates, is introduced. The estimates have simultaneously the following properties: (a) they are qualitatively robust, (b) their breakdown point is .5, and (c) they are highly efficient for regression models with normal errors. They are defined by minimizing a new scale estimate, τ, applied to the residuals. Asymptotically, a τ estimate is equivalent to an M estimate with a ψ function given by a weighted average of two ψ functions, one corresponding to a very robust estimate and the other to a highly efficient estimate. The weights are adaptive and depend on the underlying error distribution. We prove consistency and asymptotic normality and give a convergent iterative computing algorithm. Finally, we compare the biases produced by gross error contamination in the τ estimates and optimal bounded-influence estimates.


Statistics & Probability Letters | 1997

A multivariate Kolmogorov-Smirnov test of goodness of fit

Ana Justel; Daniel Peña; Ruben H. Zamar

This paper presents a distribution-free multivariate Kolmogorov-Smirnov goodness-of-fit test. The test uses a statistic which is built using Rosenblatts transformation and an algorithm is developed to compute it in the bivariate case. An approximate test, that can be easily computed in any dimension, is also presented. The power of these multivariate tests is studied in a simulation study.


Technometrics | 2002

Robust Estimates of Location and Dispersion for High-Dimensional Datasets

Ricardo A. Maronna; Ruben H. Zamar

The computing times of high-breakdown point estimates of multivariate location and scatter increase rapidly with the number of variables, which makes them impractical for high-dimensional datasets, such as those used in data mining. We propose an estimator of location and scatter based on a modified version of the Gnanadesikan–Kettenring robust covariance estimate. We compare its behavior with that of the Stahel–Donoho (SD) and Rousseeuw and Van Driessens fast MCD (FMCD) estimates. In simulations with contaminated multivariate normal data, our estimate is almost as good as SD and clearly better than FMCD. It is much faster than both, especially for large dimension. We give examples with real data with dimensions between 5 and 93, in which the proposed estimate is as good as or better than SD and FMCD at detecting outliers and other structures, with much shorter computing times.


Archive | 1991

A Procedure for Robust Estimation and Inference in Linear Regression

Victor J. Yohai; Werner A. Stahel; Ruben H. Zamar

Even if robust regression estimators have been around for nearly 20 years, they have not found widespread application. One obstacle is the diversity of estimator types and the necessary choices of tuning constants, combined with a lack of guidance for these decisions. While some participants of the IMA summer program have argued that these choices should always be made in view of the specific problem at hand, we propose a procedure which should fit many purposes reasonably well. A second obstacle is the lack of simple procedures for inference, or the reluctance to use the straightforward inference based on asymptotics.


Journal of the American Statistical Association | 2007

Robust Linear Model Selection Based on Least Angle Regression

Jafar A Khan; Stefan Van Aelst; Ruben H. Zamar

In this article we consider the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contain anomalies that are difficult to visualize and clean. We want to predict the nonoutlying cases; therefore, we need a method that is simultaneously robust and scalable. We consider the stepwise least angle regression (LARS) algorithm which is computationally very efficient but sensitive to outliers. We introduce two different approaches to robustify LARS. The plug-in approach replaces the classical correlations in LARS by robust correlation estimates. The cleaning approach first transforms the data set by shrinking the outliers toward the bulk of the data (which we call multivariate Winsorization) and then applies LARS to the transformed data. We show that the plug-in approach is time-efficient and scalable and that the bootstrap can be used to stabilize its results. We recommend using bootstrapped robustified LARS to sequence a number of candidate predictors to form a reduced set from which a more refined model can be selected.


knowledge discovery and data mining | 2002

Scalable robust covariance and correlation estimates for data mining

Fatemah Alqallaf; Kjell P. Konis; R. Douglas Martin; Ruben H. Zamar

Covariance and correlation estimates have important applications in data mining. In the presence of outliers, classical estimates of covariance and correlation matrices are not reliable. A small fraction of outliers, in some cases even a single outlier, can distort the classical covariance and correlation estimates making them virtually useless. That is, correlations for the vast majority of the data can be very erroneously reported; principal components transformations can be misleading; and multidimensional outlier detection via Mahalanobis distances can fail to detect outliers. There is plenty of statistical literature on robust covariance and correlation matrix estimates with an emphasis on affine-equivariant estimators that possess high breakdown points and small worst case biases. All such estimators have unacceptable exponential complexity in the number of variables and quadratic complexity in the number of observations. In this paper we focus on several variants of robust covariance and correlation matrix estimates with quadratic complexity in the number of variables and linear complexity in the number of observations. These estimators are based on several forms of pairwise robust covariance and correlation estimates. The estimators studied include two fast estimators based on coordinate-wise robust transformations embedded in an overall procedure recently proposed by [14]. We show that the estimators have attractive robustness properties, and give an example that uses one of the estimators in the new Insightful Miner data mining product.


Computational Statistics & Data Analysis | 2007

CLUES: A non-parametric clustering method based on local shrinking

Xiaogang Wang; Weiliang Qiu; Ruben H. Zamar

A novel non-parametric clustering method based on non-parametric local shrinking is proposed. Each data point is transformed in such a way that it moves a specific distance toward a cluster center. The direction and the associated size of each movement are determined by the median of its K-nearest neighbors. This process is repeated until a pre-defined convergence criterion is satisfied. The optimal value of the number of neighbors is determined by optimizing some commonly used index functions that measure the strengths of clusters generated by the algorithm. The number of clusters and the final partition are determined automatically without any input parameter except the stopping rule for convergence. Experiments on simulated and real data sets suggest that the proposed algorithm achieves relatively high accuracies when compared with classical clustering algorithms.


Journal of Statistical Planning and Inference | 1997

Optimal locally robust M-estimates of regression

Victor J. Yohai; Ruben H. Zamar

Abstract First, we show that many robust estimates of regression which depend only on the regression residuals (including M-, S-, Tau-, least median of squares-, least trimmed of squares- and some R-estimates) have infinite gross-error-sensitivity. More precisely, we show that the maximum-bias function of a large class of estimates, called residual admissible in Yohai and Zamar (Ann. Statist. 21, 1993, 1824–1842), is of order √e near zero. Based on this finding we define a new robustness measure for estimates with BT(e) = o(eβ), the contamination sensitivity of order β, which extends Hampels gross error sensitivity for estimates with unbounded influence. We compute this measure for regression M-estimates with a general scale and show that β = 0.5 in this case. Then we solve a Hampel-like optimality problem, namely, one of minimizing the asymptotic variance subject to a bound on the contamination sensitivity of order β = 0.5, for estimates in this class. Finally, we show that a certain least α-quantile estimate has the smallest contamination sensitivity of order 0.5 among all residual admissible estimates. In the Gaussian case α = 0.683.


Computational Statistics & Data Analysis | 2007

Building a robust linear model with forward selection and stepwise procedures

Jafar A Khan; Stefan Van Aelst; Ruben H. Zamar

Abstract Classical step-by-step algorithms, such as forward selection (FS) and stepwise (SW) methods, are computationally suitable, but yield poor results when the data contain outliers and other contaminations. Robust model selection procedures, on the other hand, are not computationally efficient or scalable to large dimensions, because they require the fitting of a large number of submodels. Robust and computationally efficient versions of FS and SW are proposed. Since FS and SW can be expressed in terms of sample correlations, simple robustifications are obtained by replacing these correlations by their robust counterparts. A pairwise approach is used to construct the robust correlation matrix—not only because of its computational advantages over the d-dimensional approach, but also because the pairwise approach is more consistent with the idea of step-by-step algorithms. The proposed robust methods have much better performance compared to standard FS and SW. Also, they are computationally very suitable and scalable to large high-dimensional data sets.


Computational Statistics & Data Analysis | 2006

Linear grouping using orthogonal regression

Stefan Van Aelst; Xiaogang Wang; Ruben H. Zamar; Rong Zhu

A new method to detect different linear structures in a data set, called Linear Grouping Algorithm (LGA), is proposed. LGA is useful for investigating potential linear patterns in data sets, that is, subsets that follow different linear relationships. LGA combines ideas from principal components, clustering methods and resampling algorithms. It can detect several different linear relations at once. Methods to determine the number of groups in the data are proposed. Diagnostic tools to investigate the results obtained from LGA are introduced. It is shown how LGA can be extended to detect groups characterized by lower dimensional hyperplanes as well. Some applications illustrate the usefulness of LGA in practice.

Collaboration


Dive into the Ruben H. Zamar's collaboration.

Top Co-Authors

Avatar

Victor J. Yohai

University of Buenos Aires

View shared research outputs
Top Co-Authors

Avatar

Matias Salibian-Barrera

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Stefan Van Aelst

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hongyang Zhang

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Stefan Van Aelst

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

José R. Berrendero

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Andy Leung

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge