Ricardo Cao
University of A Coruña
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ricardo Cao.
Computational Statistics & Data Analysis | 1994
Ricardo Cao; Antonio Cuevas; Wensceslao González Manteiga
Abstract The theory of bandwidth choice in density estimation is developing very fast. Several methods (with plenty of varieties and subvarieties) have been recently proposed as an alternative to least squares cross-validation, the standard for years. This paper includes (a) A critical up-to-date review of the main methods currently available. The discussion provide some new insights on the important problem of estimating the minimization criteria and on the choice of pilot bandwidths in bootstrap-based methods. (b) An extensive simulation study of ten selected bandwidths. (c) A final discussion with some recommendations for practitioners. The conclusions are not easily summarized in a few words, because different cases have to be considered and important nuances must be pointed out. However, we could mention that the classical cross-validation bandwidths show, generally speaking, a relatively poor behavior (this is especially clear for the pseudo-likelihood method). On the other hand, although no selector appears to be uniformly better, the plug-in (in a similar version to that proposed by Sheather and Jones, J. Royal Statist. Soc. Ser. B 5 1991) and the (smoothed) bootstrap-based selectors show a fairly satisfactory performance which suggests that they could be the new standard methods for the problem of smoothing in density estimation. Interesting results are also obtained for a new type of bandwidths based on the number of inflection points.
Test | 1997
Duc Devroye; Jan Beirlant; Ricardo Cao; Ricardo Fraiman; Peter Hall; M. C. Jones; Gábor Lugosi; Enno Mammen; J. S. Marron; César Sánchez-Sellero; J. Uña; Frederic Udina; Luc Devroye
AbstractIn earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, forall densities in all dimensions, theL1 error of the corresponding kernel estimate is not larger than 3+∈ times the error of the estimate with the optimal smoothing factor plus a constant times
Journal of the American Statistical Association | 1996
Wenceslao González-Manteiga; Ricardo Cao; J. S. Marron
Annals of Human Genetics | 2009
Manuel García-Magariños; Ignacio López-de-Ullibarri; Ricardo Cao; Antonio Salas
\sqrt {\log n/n}
Test | 1993
Wenceslao González-Manteiga; Ricardo Cao
International Journal of Legal Medicine | 1993
Emilio Valverde; Carmen Cabrero; Ricardo Cao; M. S. Rodríguez-Calvo; Díez A; Francisco Barross; Jorge Alemany; Angel Carracedo
, wheren is the sample size, and the constant only depends on the complexity of the kernel used in the estimate. The result is nonasymptotic, that is, the bound is valid for eachn. The estimate uses ideas from the minimum distance estimation work of Yatracos. We present a practical implementation of this estimate, report on some comparative results, and highlight some key properties of the new method.
Technometrics | 1995
Ignacio García-Jurado; Wenceslao González-Manteiga; J. M. Prada-Sánchez; Manuel Febrero-Bande; Ricardo Cao
Abstract An asymptotic representation of the mean weighted integrated squared error for the kernel-based estimator of the hazard rate in the presence of right-censored samples is obtained for different bootstrap resampling methods. As a consequence, a new bandwidth selector based on the bootstrap is introduced. Very satisfactory simulations results are obtained in comparison to the cross-validation selector for different models, using WARPed (i.e., binned) versions of the estimators.
Stochastic Processes and their Applications | 1999
Ángeles Saavedra; Ricardo Cao
Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree‐based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor‐dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two‐loci interaction (causal SNPs) and 98 ‘noise’ SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree‐based methods and LR are important statistical tools for the detection of unknown interactions among true risk‐associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.
Computational Statistics & Data Analysis | 1995
Ricardo Cao; Antonio Cuevas; Ricardo Fraiman
SummaryGiven the modelYi=m(χi)+ɛi,whereE(ɛi) =0,Xi≠Ci=1, ...,n, andC is ap-dimensional compact set, we have designed a new method for testing the hypothesis that the regression function follows a general linear model,m(·) ∈ {mθ(·) =At(·)θ}θ∈Θ⊂ℛq, withA a function fromℜp toℜq. The statistic, denoted ΔASE, used fortesting the given hypothesis is defined to be the difference between the average squared errors (ASE) associated with the non-parametric estimator
Journal of Nonparametric Statistics | 2005
Ricardo Cao; Ignacio López-de-Ullibarri; Paul Janssen; Noël Veraverbeke; Limburgs Universitair Centrum