A. Legarra
Institut national de la recherche agronomique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by A. Legarra.
Journal of Dairy Science | 2010
I. Aguilar; I. Misztal; D.L. Johnson; A. Legarra; S. Tsuruta; T.J. Lawlor
The first national single-step, full-information (phenotype, pedigree, and marker genotype) genetic evaluation was developed for final score of US Holsteins. Data included final scores recorded from 1955 to 2009 for 6,232,548 Holsteins cows. BovineSNP50 (Illumina, San Diego, CA) genotypes from the Cooperative Dairy DNA Repository (Beltsville, MD) were available for 6,508 bulls. Three analyses used a repeatability animal model as currently used for the national US evaluation. The first 2 analyses used final scores recorded up to 2004. The first analysis used only a pedigree-based relationship matrix. The second analysis used a relationship matrix based on both pedigree and genomic information (single-step approach). The third analysis used the complete data set and only the pedigree-based relationship matrix. The fourth analysis used predictions from the first analysis (final scores up to 2004 and only a pedigree-based relationship matrix) and prediction using a genomic based matrix to obtain genetic evaluation (multiple-step approach). Different allele frequencies were tested in construction of the genomic relationship matrix. Coefficients of determination between predictions of young bulls from parent average, single-step, and multiple-step approaches and their 2009 daughter deviations were 0.24, 0.37 to 0.41, and 0.40, respectively. The highest coefficient of determination for a single-step approach was observed when using a genomic relationship matrix with assumed allele frequencies of 0.5. Coefficients for regression of 2009 daughter deviations on parent-average, single-step, and multiple-step predictions were 0.76, 0.68 to 0.79, and 0.86, respectively, which indicated some inflation of predictions. The single-step regression coefficient could be increased up to 0.92 by scaling differences between the genomic and pedigree-based relationship matrices with little loss in accuracy of prediction. One complete evaluation took about 2h of computing time and 2.7 gigabytes of memory. Computing times for single-step analyses were slightly longer (2%) than for pedigree-based analysis. A national single-step genetic evaluation with the pedigree relationship matrix augmented with genomic information provided genomic predictions with accuracy and bias comparable to multiple-step procedures and could account for any population or data structure. Advantages of single-step evaluations should increase in the future when animals are pre-selected on genotypes.
Genetics | 2009
Gustavo de los Campos; Hugo Naya; Daniel Gianola; José Crossa; A. Legarra; Eduardo Manfredi; Kent A. Weigel; José Miguel Cotes
The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available.
Journal of Dairy Science | 2009
A. Legarra; I. Aguilar; I. Misztal
Dense molecular markers are being used in genetic evaluation for parts of the population. This requires a two-step procedure where pseudo-data (for instance, daughter yield deviations) are computed from full records and pedigree data and later used for genomic evaluation. This results in bias and loss of information. One way to incorporate the genomic information into a full genetic evaluation is by modifying the numerator relationship matrix. A naive proposal is to substitute the relationships of genotyped animals with the genomic relationship matrix. However, this results in incoherencies because the genomic relationship matrix includes information on relationships among ancestors and descendants. In other words, using the pedigree-derived covariance between genotyped and ungenotyped individuals, with the pretense that genomic information does not exist, leads to inconsistencies. It is proposed to condition the genetic value of ungenotyped animals on the genetic value of genotyped animals via the selection index (e.g., pedigree information), and then use the genomic relationship matrix for the latter. This results in a joint distribution of genotyped and ungenotyped genetic values, with a pedigree-genomic relationship matrix H. In this matrix, genomic information is transmitted to the covariances among all ungenotyped individuals. The matrix is (semi)positive definite by construction, which is not the case for the naive approach. Numerical examples and alternative expressions are discussed. Matrix H is suitable for iteration on data algorithms that multiply a vector times a matrix, such as preconditioned conjugated gradients.
Journal of Dairy Science | 2009
I. Misztal; A. Legarra; I. Aguilar
Currently, genomic evaluations use multiple-step procedures, which are prone to biases and errors. A single-step procedure may be applicable when genomic predictions can be obtained by modifying the numerator relationship matrix A to H = A + A(Delta), where A(Delta) includes deviations from expected relationships. However, the traditional mixed model equations require H(-1), which is usually difficult to obtain for large pedigrees. The computations with H are feasible when the mixed model equations are expressed in an alternate form that also applies for singular H and when those equations are solved by the conjugate gradient techniques. Then the only computations involving H are in the form of Aq or A(Delta)q, where q is a vector. The alternative equations have a nonsymmetric left-hand side. Computing A(Delta)q is inexpensive when the number of nonzeros in A(Delta) is small, and the product Aq can be calculated efficiently in linear time using an indirect algorithm. Generalizations to more complicated models are proposed. The data included 10.2 million final scores on 6.2 million Holsteins and were analyzed by a repeatability model. Comparisons involved the regular and the alternative equations. The model for the second case included simulated A(Delta). Solutions were obtained by the preconditioned conjugate gradient algorithm, which works only with symmetric matrices, and by the bi-conjugate gradient stabilized algorithm, which also works with nonsymmetric matrices. The convergence rate associated with the nonsymmetric solvers was slightly better than that with the symmetric solver for the original equations, although the time per round was twice as much for the nonsymmetric solvers. The convergence rate associated with the alternative equations ranged from 2 times lower without A(Delta) to 3 times lower for the largest simulated A(Delta). When the information attributable to genomics can be expressed as modifications to the numerator relationship matrix, the proposed methodology may allow the upgrading of an existing evaluation to incorporate the genomic information.
Genetics | 2008
A. Legarra; Christèle Robert-Granié; Eduardo Manfredi; Jean-Michel Elsen
Selection plans in plant and animal breeding are driven by genetic evaluation. Recent developments suggest using massive genetic marker information, known as “genomic selection.” There is little evidence of its performance, though. We empirically compared three strategies for selection: (1) use of pedigree and phenotypic information, (2) use of genomewide markers and phenotypic information, and (3) the combination of both. We analyzed four traits from a heterogeneous mouse population (http://gscan.well.ox.ac.uk/), including 1884 individuals and 10,946 SNP markers. We used linear mixed models, using extensions of association analysis. Cross-validation techniques were used, providing assumption-free estimates of predictive ability. Sampling of validation and training data sets was carried out across and within families, which allows comparing across- and within-family information. Use of genomewide genetic markers increased predictive ability up to 0.22 across families and up to 0.03 within families. The latter is not statistically significant. These values are roughly comparable to increases of up to 0.57 (across family) and 0.14 (within family) in accuracy of prediction of genetic value. In this data set, within-family information was more accurate than across-family information, and populational linkage disequilibrium was not a completely accurate source of information for genetic evaluation. This fact questions some applications of genomic selection.
Genetics Research | 2011
Zulma G. Vitezica; I. Aguilar; I. Misztal; A. Legarra
Prediction of genetic merit or disease risk using genetic marker information is becoming a common practice for selection of livestock and plant species. For the successful application of genome-wide marker-assisted selection (GWMAS), genomic predictions should be accurate and unbiased. The effect of selection on bias and accuracy of genomic predictions was studied in two simulated animal populations under weak or strong selection and with several heritabilities. Prediction of genetic values was by best-linear unbiased prediction (BLUP) using data either from relatives summarized in pseudodata for genotyped individuals (multiple-step method) or using all available data jointly (single-step method). The single-step method combined genomic- and pedigree-based relationship matrices. Predictions by the multiple-step method were biased. Predictions by a single-step method were less biased and more accurate but under strong selection were less accurate. When genomic relationships were shifted by a constant, the single-step method was unbiased and the most accurate. The value of that constant, which adjusts for non-random selection of genotyped individuals, can be derived analytically.
Genetics Research | 2012
H. Wang; I. Misztal; I. Aguilar; A. Legarra; William M. Muir
A common problem for genome-wide association analysis (GWAS) is lack of power for detection of quantitative trait loci (QTLs) and precision for fine mapping. Here, we present a statistical method, termed single-step GBLUP (ssGBLUP), which increases both power and precision without increasing genotyping costs by taking advantage of phenotypes from other related and unrelated subjects. The procedure achieves these goals by blending traditional pedigree relationships with those derived from genetic markers, and by conversion of estimated breeding values (EBVs) to marker effects and weights. Additionally, the application of mixed model approaches allow for both simple and complex analyses that involve multiple traits and confounding factors, such as environmental, epigenetic or maternal environmental effects. Efficiency of the method was examined using simulations with 15,800 subjects, of which 1500 were genotyped. Thirty QTLs were simulated across genome and assumed heritability was 0·5. Comparisons included ssGBLUP applied directly to phenotypes, BayesB and classical GWAS (CGWAS) with deregressed proofs. An average accuracy of prediction 0·89 was obtained by ssGBLUP after one iteration, which was 0·01 higher than by BayesB. Power and precision for GWAS applications were evaluated by the correlation between true QTL effects and the sum of m adjacent single nucleotide polymorphism (SNP) effects. The highest correlations were 0·82 and 0·74 for ssGBLUP and CGWAS with m=8, and 0·83 for BayesB with m=16. Standard deviations of the correlations across replicates were several times higher in BayesB than in ssGBLUP. The ssGBLUP method with marker weights is faster, more accurate and easier to implement for GWAS applications without computing pseudo-data.
Genetics | 2013
Zulma G. Vitezica; L. Varona; A. Legarra
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or “breeding” values of individuals are generated by substitution effects, which involve both “biological” additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the “genotypic” value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
Genetics Research | 2011
A. Legarra; Christèle Robert-Granié; Pascal Croiseau; François Guillaume; Sébastien Fritz
Empirical experience with genomic selection in dairy cattle suggests that the distribution of the effects of single nucleotide polymorphisms (SNPs) might be far from normality for some traits. An alternative, avoiding the use of arbitrary prior information, is the Bayesian Lasso (BL). Regular BL uses a common variance parameter for residual and SNP effects (BL1Var). We propose here a BL with different residual and SNP effect variances (BL2Var), equivalent to the original Lasso formulation. The λ parameter in Lasso is related to genetic variation in the population. We also suggest precomputing individual variances of SNP effects by BL2Var, to be later used in a linear mixed model (HetVar-GBLUP). Models were tested in a cross-validation design including 1756 Holstein and 678 Montbéliarde French bulls, with 1216 and 451 bulls used as training data; 51 325 and 49 625 polymorphic SNP were used. Milk production traits were tested. Other methods tested included linear mixed models using variances inferred from pedigree estimates or integrated out from the data. Estimates of genetic variation in the population were close to pedigree estimates in BL2Var but not in BL1Var. BL1Var shrank breeding values too little because of the common variance. BL2Var was the most accurate method for prediction and accommodated well major genes, in particular for fat percentage. BL1Var was the least accurate. HetVar-GBLUP was almost as accurate as BL2Var and allows for simple computations and extensions.
Genetics Selection Evolution | 2012
Sofiene Karoui; María J. Carabaño; Clara Díaz; A. Legarra
BackgroundUsing a multi-breed reference population might be a way of increasing the accuracy of genomic breeding values in small breeds. Models involving mixed-breed data do not take into account the fact that marker effects may differ among breeds. This study was aimed at investigating the impact on accuracy of increasing the number of genotyped candidates in the training set by using a multi-breed reference population, in contrast to single-breed genomic evaluations.MethodsThree traits (milk production, fat content and female fertility) were analyzed by genomic mixed linear models and Bayesian methodology. Three breeds of French dairy cattle were used: Holstein, Montbéliarde and Normande with 2976, 950 and 970 bulls in the training population, respectively and 964, 222 and 248 bulls in the validation population, respectively. All animals were genotyped with the Illumina Bovine SNP50 array. Accuracy of genomic breeding values was evaluated under three scenarios for the correlation of genomic breeding values between breeds (rg): uncorrelated (1), rg = 0; estimated rg (2); high, rg = 0.95 (3). Accuracy and bias of predictions obtained in the validation population with the multi-breed training set were assessed by the coefficient of determination (R2) and by the regression coefficient of daughter yield deviations of validation bulls on their predicted genomic breeding values, respectively.ResultsThe genetic variation captured by the markers for each trait was similar to that estimated for routine pedigree-based genetic evaluation. Posterior means for rg ranged from −0.01 for fertility between Montbéliarde and Normande to 0.79 for milk yield between Montbéliarde and Holstein. Differences in R2 between the three scenarios were notable only for fat content in the Montbéliarde breed: from 0.27 in scenario (1) to 0.33 in scenarios (2) and (3). Accuracies for fertility were lower than for other traits.ConclusionsUsing a multi-breed reference population resulted in small or no increases in accuracy. Only the breed with a small data set and large genetic correlation with the breed with a large data set showed increased accuracy for the traits with moderate (milk) to high (fat content) heritability. No benefit was observed for fertility, a lowly heritable trait.