Ana I. Vazquez
Michigan State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ana I. Vazquez.
PLOS Genetics | 2011
Robert Makowsky; Nicholas M. Pajewski; Yann C. Klimentidis; Ana I. Vazquez; Christine W. Duarte; David B. Allison; Gustavo de los Campos
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the “missing heritability” for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h2 up to 0.83, R2 up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R2 values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼0.80), substantial room for improvement remains.
PLOS Genetics | 2013
Gustavo de los Campos; Ana I. Vazquez; Rohan L. Fernando; Yann C. Klimentidis; Danny C. Sorensen
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.
Journal of Animal Science | 2010
Ana I. Vazquez; D.M. Bates; Guilherme J. M. Rosa; Daniel Gianola; K.A. Weigel
Mixed models have been used extensively in quantitative genetics to study continuous and discrete traits. A standard quantitative genetic model proposes that the effects of levels of some random factor (e.g., sire) are correlated accordingly with their relationships. For this reason, routines for mixed models available in standard packages cannot be used for genetic analysis. The pedigreemm package of R was developed as an extension of the lme4 package, and allows mixed models with correlated random effects to be fitted for Gaussian, binary, and count responses. Following the method of Harville and Callanan (1989), a correlation between levels of the grouping factor (e.g., sire) is induced by post-multiplying the incidence matrix of the levels of this random factor by the Cholesky factor of the corresponding (co)variance matrix (e.g., the numerator relationship matrix between sires). Estimation methods available in pedigreemm include approximations to maximum likelihood and REML. This note describes the classes of models that can be fitted using pedigreemm and presents examples that illustrate its use.
Journal of Dairy Science | 2010
K.A. Weigel; G. de los Campos; Ana I. Vazquez; Guilherme J. M. Rosa; Daniel Gianola; C.P. Van Tassell
The objective of the present study was to evaluate the predictive ability of direct genomic values for economically important dairy traits when genotypes at some single nucleotide polymorphism (SNP) loci were imputed rather than measured directly. Genotypic data consisted of 42,552 SNP genotypes for each of 1,762 Jersey sires. Phenotypic data consisted of predicted transmitting abilities (PTA) for milk yield, protein percentage, and daughter pregnancy rate from May 2006 for 1,446 sires in the training set and from April 2009 for 316 sires in the testing set. The SNP effects were estimated using the Bayesian least absolute selection and shrinkage operator (LASSO) method with data of sires in the training set, and direct genomic values (DGV) for sires in the testing set were computed by multiplying these estimates by corresponding genotype dosages for sires in the testing set. The mean correlation across traits between DGV (before progeny testing) and PTA (after progeny testing) for sires in the testing set was 70.6% when all 42,552 SNP genotypes were used. When genotypes for 93.1, 96.6, 98.3, or 99.1% of loci were masked and subsequently imputed in the testing set, mean correlations across traits between DGV and PTA were 68.5, 64.8, 54.8, or 43.5%, respectively. When genotypes were also masked and imputed for a random 50% of sires in the training set, mean correlations across traits between DGV and PTA were 65.7, 63.2, 53.9, or 49.5%, respectively. Results of this study indicate that if a suitable reference population with high-density genotypes is available, a low-density chip comprising 3,000 equally spaced SNP may provide approximately 95% of the predictive ability observed with the BovineSNP50 Beadchip (Illumina Inc., San Diego, CA) in Jersey cattle. However, if fewer than 1,500 SNP are genotyped, the accuracy of DGV may be limited by errors in the imputed genotypes of selection candidates.
Journal of Dairy Science | 2009
Ana I. Vazquez; Daniel Gianola; D.M. Bates; K.A. Weigel; B. Heringstad
Clinical mastitis is typically coded as presence/absence during some period of exposure, and records are analyzed with linear or binary data models. Because presence includes cows with multiple episodes, there is loss of information when a count is treated as a binary response. The Poisson model is designed for counting random variables, and although it is used extensively in epidemiology of mastitis, it has rarely been used for studying the genetics of mastitis. Many models have been proposed for genetic analysis of mastitis, but they have not been formally compared. The main goal of this study was to compare linear (Gaussian), Bernoulli (with logit link), and Poisson models for the purpose of genetic evaluation of sires for mastitis in dairy cattle. The response variables were clinical mastitis (CM; 0, 1) and number of CM cases (NCM; 0, 1, 2, ..). Data consisted of records on 36,178 first-lactation daughters of 245 Norwegian Red sires distributed over 5,286 herds. Predictive ability of models was assessed via a 3-fold cross-validation using mean squared error of prediction (MSEP) as the end-point. Between-sire variance estimates for NCM were 0.065 in Poisson and 0.007 in the linear model. For CM the between-sire variance was 0.093 in logit and 0.003 in the linear model. The ratio between herd and sire variances for the models with NCM response was 4.6 and 3.5 for Poisson and linear, respectively, and for model for CM was 3.7 in both logit and linear models. The MSEP for all cows was similar. However, within healthy animals, MSEP was 0.085 (Poisson), 0.090 (linear for NCM), 0.053 (logit), and 0.056 (linear for CM). For mastitic animals the MSEP values were 1.206 (Poisson), 1.185 (linear for NCM response), 1.333 (logit), and 1.319 (linear for CM response). The models for count variables had a better performance when predicting diseased animals and also had a similar performance between them. Logit and linear models for CM had better predictive ability for healthy cows and had a similar performance between them.
Genetics | 2012
Ana I. Vazquez; Gustavo de los Campos; Yann C. Klimentidis; Guilherme J. M. Rosa; Daniel Gianola; Nengjun Yi; David B. Allison
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.
Animal Reproduction Science | 2009
G. Quintans; Ana I. Vazquez; K.A. Weigel
Suckling and nutrition are generally recognized as two major factors controlling the duration of the postpartum anovulatory period. In the present study, the effect of premature weaning and suckling restriction with nose plates (NPs) on cow and calf performance was evaluated. The study was conducted over 2 years; primiparous Hereford cows, weighing (mean+/-S.E.M.) 344+/-3.5kg and with 4.1+/-0.05 units of body condition score (BCS) (scale 1-8 [Vizcarra, J.A., Ibañez, W., Orcasberro, R., 1986. Repetibilidad y reproductibilidad de dos escalas para estimar la condición corporal de vacas Hereford. Investigaciones Agronómicas 7 (1), 45-47]) at calving, remained with their calves until 72.5+/-1.2 days postpartum (day 0). They were then assigned to one of three treatments: (i) calves with free access to their dams and ad libitum suckling (S, n=29); (ii) calves fitted with NPs for 14 days, but remained with their dams (NP, n=29), and (iii) calves that were weaned from their dams (W, n=28). All cows were anestrus at the time treatments commenced (day 0). All cows were blood sampled twice weekly from 1 week before the beginning of the experiment until the end of the mating period (day 74) for progesterone analysis. The mating period began on day 14. Cows in W treatment had ovulations earlier (P<0.05) than those in NP and S groups. Cows in the NP group had longer (P<0.05) intervals between the first progesterone increase and normal luteal phase than cows in the other two treatments groups (23.3+/-3.2 vs. 6.5+/-3.2 and 5.2+/-3.3 days for NP, S and W cows, respectively). Fifty per cent of the cows with NP had a short cycle (7 days) but there was a group of cows that had longer (P<0.05) intervals (66 days) between first progesterone increase and normal estrous activity. In the NP group, 8 of 29 cows had a short luteal phase and then a normal one; for 9 of these 29 cows progesterone concentrations remained low for 6 weeks from the beginning of the treatment; and for 12 of these 29 cows progesterone concentrations initially increased after treatment initiation, but these animals became anestrus thereafter. Short-term suckling restriction with NPs led to a variable response in primiparous cows of moderate body condition under range conditions.
Frontiers in Genetics | 2012
M. Angeles Pérez-Cabal; Ana I. Vazquez; Daniel Gianola; Guilherme J. M. Rosa; Kent A. Weigel
The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.
Frontiers in Genetics | 2013
Yann C. Klimentidis; Ana I. Vazquez; Gustavo de los Campos; David B. Allison; Mark T. Dransfield; Victor J. Thannickal
Asthma and chronic obstructive pulmonary disease (COPD) are major worldwide health problems. Pulmonary function testing is a useful diagnostic tool for these diseases, and is known to be influenced by genetic and environmental factors. Previous studies have demonstrated that a substantial proportion of the variation in pulmonary function phenotypes can be explained by familial relationships. The availability of whole-genome single nucleotide polymorphism (SNP) data enables us to further evaluate the extent to which genetic factors account for variation in pulmonary function and to compare pedigree- to SNP-based estimates of heritability. Here, we employ methods developed in the animal breeding field to estimate the heritability of forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and the ratio of these two measures (FEV1/FVC) among subjects in the Framingham Heart Study dataset. We compare heritability estimates based on pedigree-based relationships to those based on genome-wide SNPs. We find that, in a family-based study, estimates of heritability using SNP data are nearly identical to estimates based on pedigree information, and range from 0.50 for FEV1 to 0.66 for FEV1/FVC. Therefore, we conclude that genetic factors account for a sizable proportion of inter-individual differences in pulmonary function, and that estimates of heritability based on SNP data are nearly identical to estimates based on pedigree data. Finally, our findings suggest a higher heritability for FEV1/FVC compared to either FEV1 or FVC.
Journal of Dairy Science | 2009
Ana I. Vazquez; K.A. Weigel; Daniel Gianola; D.M. Bates; M.A. Pérez-Cabal; Guilherme J. M. Rosa; Y.M. Chang
Typically, clinical mastitis is coded as the presence or absence of disease in a given lactation, and records are analyzed with either linear models or binary threshold models. Because the presence of mastitis may include cows with multiple episodes, there is a loss of information when counts are treated as binary responses. Poisson models are appropriated for random variables measured as the number of events, and although these models are used extensively in studying the epidemiology of mastitis, they have rarely been used for studying the genetic aspects of mastitis. Ordinal threshold models are pertinent for ordered categorical responses; although one can hypothesize that the number of clinical mastitis episodes per animal reflects a continuous underlying increase in mastitis susceptibility, these models have rarely been used in genetic analysis of mastitis. The objective of this study was to compare probit, Poisson, and ordinal threshold models for the genetic evaluation of US Holstein sires for clinical mastitis. Mastitis was measured as a binary trait or as the number of mastitis cases. Data from 44,908 first-parity cows recorded in on-farm herd management software were gathered, edited, and processed for the present study. The cows were daughters of 1,861 sires, distributed over 94 herds. Predictive ability was assessed via a 5-fold cross-validation using 2 loss functions: mean squared error of prediction (MSEP) as the end point and a cost difference function. The heritability estimates were 0.061 for mastitis measured as a binary trait in the probit model and 0.085 and 0.132 for the number of mastitis cases in the ordinal threshold and Poisson models, respectively; because of scale differences, only the probit and ordinal threshold models are directly comparable. Among healthy animals, MSEP was smallest for the probit model, and the cost function was smallest for the ordinal threshold model. Among diseased animals, MSEP and the cost function were smallest for the Poisson model, followed by the ordinal threshold model. In general, the models for count variables more accurately identified diseased animals and more accurately predicted mastitis costs. Healthy animals were more accurately identified by the probit model.