Diego Jarquin
University of Nebraska–Lincoln
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diego Jarquin.
BMC Genomics | 2014
Diego Jarquin; Kyle Kocak; Luis Posadas; Katie E. Hyma; Joseph Jedlicka; George L. Graef; Aaron J. Lorenz
BackgroundAdvances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.ResultsGenotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.ConclusionsUsing GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.
Trends in Plant Science | 2017
José Crossa; Paulino Pérez-Rodríguez; Jaime Cuevas; Osval A. Montesinos-López; Diego Jarquin; Gustavo de los Campos; Juan Burgueño; Juan Manuel González-Camacho; Sergio Pérez-Elizalde; Yoseph Beyene; Susanne Dreisigacker; Ravi P. Singh; Xuecai Zhang; Manje Gowda; Manish Roorkiwal; Jessica Rutkoski; Rajeev K. Varshney
Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding.
The Plant Genome | 2015
Nonoy Bandillo; Diego Jarquin; Qijian Song; Randall L. Nelson; Perry B. Cregan; James E. Specht; Aaron J. Lorenz
Population structure analyses and genome‐wide association studies (GWAS) conducted on crop germplasm collections provide valuable information on the frequency and distribution of alleles governing economically important traits. The value of these analyses is substantially enhanced when the accession numbers can be increased from ∼1,000 to ∼10,000 or more. In this research, we conducted the first comprehensive analysis of population structure on the collection of 14,000 soybean accessions [Glycine max (L.) Merr. and G. soja Siebold & Zucc.] using a 50K‐SNP chip. Accessions originating from Japan were relatively homogenous and distinct from the Korean accessions. As a whole, both Japanese and Korean accessions diverged from the Chinese accessions. The ancestry of founders of the American accessions derived mostly from two Chinese subpopulations, which reflects the composition of the American accessions as a whole. A 12,000 accession GWAS conducted on seed protein and oil is the largest reported to date in plants and identified single nucleotide polymorphisms (SNPs) with strong signals on chromosomes 20 and 15. A chromosome 20 region previously reported to be important for protein and oil content was further narrowed and now contains only three plausible candidate genes. The haplotype effects show a strong negative relationship between oil and protein at this locus, indicating negative pleiotropic effects or multiple closely linked loci in repulsion phase linkage. The vast majority of accessions carry the haplotype allele conferring lower protein and higher oil. Our results provide a fuller understanding of the distribution of genetic variation contained within the USDA soybean collection and how it relates to phenotypic variation for economically important traits.
G3: Genes, Genomes, Genetics | 2016
José Crossa; Diego Jarquin; Jorge Franco; Paulino Pérez-Rodríguez; Juan Burgueño; Carolina Saint-Pierre; Phrashant Vikram; Carolina Paola Sansaloni; Cesar Petroli; Deniz Akdemir; Clay H. Sneller; Matthew P. Reynolds; Maria Tattaris; Thomas Payne; Carlos Guzmán; Roberto J. Peña; Peter Wenzl; Sukhwinder Singh
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.
Scientific Reports | 2016
C. Saint Pierre; Juan Burgueño; José Crossa; G. Fuentes Dávila; P. Figueroa López; E. Solís Moya; J. Ireta Moreno; V. M. Hernández Muela; V. M. Zamora Villa; Prashant Vikram; Ky L. Mathews; Carolina Paola Sansaloni; Deepmala Sehgal; Diego Jarquin; Peter Wenzl; Sukhwinder Singh
Genomic and pedigree predictions for grain yield and agronomic traits were carried out using high density molecular data on a set of 803 spring wheat lines that were evaluated in 5 sites characterized by several environmental co-variables. Seven statistical models were tested using two random cross-validations schemes. Two other prediction problems were studied, namely predicting the lines’ performance at one site with another (pairwise-site) and at untested sites (leave-one-site-out). Grain yield ranged from 3.7 to 9.0 t ha−1 across sites. The best predictability was observed when genotypic and pedigree data were included in the models and their interaction with sites and the environmental co-variables. The leave-one-site-out increased average prediction accuracy over pairwise-site for all the traits, specifically from 0.27 to 0.36 for grain yield. Days to anthesis, maturity, and plant height predictions had high heritability and gave the highest accuracy for prediction models. Genomic and pedigree models coupled with environmental co-variables gave high prediction accuracy due to high genetic correlation between sites. This study provides an example of model prediction considering climate data along-with genomic and pedigree information. Such comprehensive models can be used to achieve rapid enhancement of wheat yield enhancement in current and future climate change scenario.
G3: Genes, Genomes, Genetics | 2016
Diego Jarquin; James E. Specht; Aaron J. Lorenz
The identification and mobilization of useful genetic variation from germplasm banks for use in breeding programs is critical for future genetic gain and protection against crop pests. Plummeting costs of next-generation sequencing and genotyping is revolutionizing the way in which researchers and breeders interface with plant germplasm collections. An example of this is the high density genotyping of the entire USDA Soybean Germplasm Collection. We assessed the usefulness of 50K single nucleotide polymorphism data collected on 18,480 domesticated soybean (Glycine max) accessions and vast historical phenotypic data for developing genomic prediction models for protein, oil, and yield. Resulting genomic prediction models explained an appreciable amount of the variation in accession performance in independent validation trials, with correlations between predicted and observed reaching up to 0.92 for oil and protein and 0.79 for yield. The optimization of training set design was explored using a series of cross-validation schemes. It was found that the target population and environment need to be well represented in the training set. Second, genomic prediction training sets appear to be robust to the presence of data from diverse geographical locations and genetic clusters. This finding, however, depends on the influence of shattering and lodging, and may be specific to soybean with its presence of maturity groups. The distribution of 7608 nonphenotyped accessions was examined through the application of genomic prediction models. The distribution of predictions of phenotyped accessions was representative of the distribution of predictions for nonphenotyped accessions, with no nonphenotyped accessions being predicted to fall far outside the range of predictions of phenotyped accessions.
G3: Genes, Genomes, Genetics | 2015
J. Jesus Céron-Rojas; José Crossa; Vivi N. Arief; K. E. Basford; Jessica Rutkoski; Diego Jarquin; Gregorio Alvarado; Yoseph Beyene; Kassa Semagn; I. H. DeLacy
A genomic selection index (GSI) is a linear combination of genomic estimated breeding values that uses genomic markers to predict the net genetic merit and select parents from a nonphenotyped testing population. Some authors have proposed a GSI; however, they have not used simulated or real data to validate the GSI theory and have not explained how to estimate the GSI selection response and the GSI expected genetic gain per selection cycle for the unobserved traits after the first selection cycle to obtain information about the genetic gains in each subsequent selection cycle. In this paper, we develop the theory of a GSI and apply it to two simulated and four real data sets with four traits. Also, we numerically compare its efficiency with that of the phenotypic selection index (PSI) by using the ratio of the GSI response over the PSI response, and the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits, respectively. In addition, we used the Technow inequality to compare GSI vs. PSI efficiency. Results from the simulated data were confirmed by the real data, indicating that GSI was more efficient than PSI per unit of time.
G3: Genes, Genomes, Genetics | 2017
Sivakumar Sukumaran; José Crossa; Diego Jarquin; Marta S. Lopes; Matthew P. Reynolds
Developing genomic selection (GS) models is an important step in applying GS to accelerate the rate of genetic gain in grain yield in plant breeding. In this study, seven genomic prediction models under two cross-validation (CV) scenarios were tested on 287 advanced elite spring wheat lines phenotyped for grain yield (GY), thousand-grain weight (GW), grain number (GN), and thermal time for flowering (TTF) in 18 international environments (year-location combinations) in major wheat-producing countries in 2010 and 2011. Prediction models with genomic and pedigree information included main effects and interaction with environments. Two random CV schemes were applied to predict a subset of lines that were not observed in any of the 18 environments (CV1), and a subset of lines that were not observed in a set of the environments, but were observed in other environments (CV2). Genomic prediction models, including genotype × environment (G×E) interaction, had the highest average prediction ability under the CV1 scenario for GY (0.31), GN (0.32), GW (0.45), and TTF (0.27). For CV2, the average prediction ability of the model including the interaction terms was generally high for GY (0.38), GN (0.43), GW (0.63), and TTF (0.53). Wheat lines in site-year combinations in Mexico and India had relatively high prediction ability for GY and GW. Results indicated that prediction ability of lines not observed in certain environments could be relatively high for genomic selection when predicting G×E interaction in multi-environment trials.
Nature Communications | 2017
Joseph L. Gage; Diego Jarquin; Cinta Romay; Aaron J. Lorenz; Edward S. Buckler; Shawn M. Kaeppler; Naser Alkhalifah; M. Bohn; Darwin A. Campbell; Jode W. Edwards; David Ertl; Sherry Flint-Garcia; Jack M. Gardiner; Byron Good; Candice N. Hirsch; James B. Holland; David C. Hooker; Joseph E. Knoll; Judith M. Kolkman; Greg R. Kruger; Nick Lauter; Carolyn J. Lawrence-Dill; E. A. Lee; Jonathan P. Lynch; Seth C. Murray; Rebecca J. Nelson; Jane Petzoldt; Torbert Rocheford; James C. Schnable; Brian T. Scully
Remarkable productivity has been achieved in crop species through artificial selection and adaptation to modern agronomic practices. Whether intensive selection has changed the ability of improved cultivars to maintain high productivity across variable environments is unknown. Understanding the genetic control of phenotypic plasticity and genotype by environment (G × E) interaction will enhance crop performance predictions across diverse environments. Here we use data generated from the Genomes to Fields (G2F) Maize G × E project to assess the effect of selection on G × E variation and characterize polymorphisms associated with plasticity. Genomic regions putatively selected during modern temperate maize breeding explain less variability for yield G × E than unselected regions, indicating that improvement by breeding may have reduced G × E of modern temperate cultivars. Trends in genomic position of variants associated with stability reveal fewer genic associations and enrichment of variants 0–5000 base pairs upstream of genes, hypothetically due to control of plasticity by short-range regulatory elements.Breeding has increased crop productivity, but whether it has also changed phenotypic plasticity is unclear. Here, the authors find maize genomic regions selected for high productivity show reduced contribution to genotype by environment variation and provide evidence for regulatory control of phenotypic stability.
G3: Genes, Genomes, Genetics | 2017
Massaine Bandeira e Souza; Jaime Cuevas; Evellyn Giselly de Oliveira Couto; Paulino Pérez-Rodríguez; Diego Jarquin; Roberto Fritsche-Neto; Juan Burgueño; José Crossa
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied.