Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Charles Chen is active.

Publication


Featured researches published by Charles Chen.


G3: Genes, Genomes, Genetics | 2013

Genomic Prediction in Maize Breeding Populations with Genotyping-by-Sequencing

José Crossa; Yoseph Beyene; Semagn Kassa; Paulino Pérez; John Hickey; Charles Chen; Gustavo de los Campos; Juan Burgueño; Vanessa S. Windhausen; Edward S. Buckler; Jean-Luc Jannink; Marco A. Lopez Cruz; Raman Babu

Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.


Annals of Forest Science | 2008

Pedigree and mating system analyses in a western larch (Larix occidentalis Nutt.) experimental population

Tomas Funda; Charles Chen; Cherdsak Liewlaksaneeyanawin; Ahmed M. A. Kenawy; Yousry A. El-Kassaby

Abstract•The mating pattern and gene flow in a western larch (Larix occidentalis Nutt.) experimental population was studied with the aid of microsatellite markers and a combination of paternity-mating system analysis. The commonly difficult to assess, male gametic contribution was determined with 95% confidence and its impact on genetic gain and diversity was determined.• Male fertility success rate ranged between 0 and 11%. Male reproductive output parental imbalance was observed with 50% of the pollen being produced by the top 5% of males while the lower 39% males only produced 10% of the pollen.• A significant difference was observed between male effective population size (genetic diversity) estimates from paternity assignment compared to those based on population’s census number (21 vs. 41); however, this difference did not affect estimates of genetic gain.• A total of 221 full-fib families were identified (sample size range: 1–8) and were nested among the studied 14 seed-donors.• A combination of paternity-mating system analysis is recommended to provide a better insight into seed orchards’ mating dynamics. While pollen flow tends to inflate mating system’s outcrossing rate, the paternity analysis effectively determined the rate and magnitude of contamination across receptive females.Résumé• Les modes de croisement et les flux de gènes dans une population expérimentale de mélèze occidental (Larix occidentalis Nutt.) ont été étudiés à l’aide de marqueurs microsatellites et d’une analyse combinée de paternité et du système de reproduction. La contribution gamétique mâle — communément difficile à estimer — a été déterminée avec un seuil de confiance de 95 % et son impact sur le gain génétique et la diversité a été déterminé.• Le taux de succès reproductif mâle était compris entre 0 et 11 %. Un déséquilibre dans la contribution des parents mâles a été observé avec la production de 50 % du pollen par 5 % des pères alors que 39 % d’entre eux ne contribuaient que pour seulement 10 % du pollen.• Une différence significative a été observée entre la taille efficace de la population mâle (diversité génétique) estimée par la recherche de paternité et celle basée sur les effectifs recensés de la population (21 vs. 41) ; cependant, cette différence n’affecte pas l’estimation du gain génétique.• 221 familles de plein-frères ont été identifiées (effectifs entre 1 et 8), regroupées parmi les 14 arbres-mères étudiés.• La combinaison d’une analyse de paternité et du système de reproduction est recommandée pour étudier de manière approfondie la dynamique de croisement en vergers à graines. Tandis que les flux de pollen tendent à augmenter le taux d’inter-croisements, l’analyse de paternité détermine de manière effective le taux et l’amplitude de contamination des arbres-mères.


BMC Genomics | 2015

Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing

Omnia Gamal El-Dien; Blaise Ratcliffe; Jaroslav Klápště; Charles Chen; Ilga Porth; Yousry A. El-Kassaby

BackgroundGenomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost.ResultsGenotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3.ConclusionsThe application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.


Molecular Ecology Resources | 2009

Development and characterization of microsatellite loci in western larch (Larix occidentalis Nutt.)

Charles Chen; Cherdsak Liewlaksaneeyanawin; Tomas Funda; A. Kenawy; C. H. Newton; Yousry A. El-Kassaby

Western larch (Larix occidentalis Nutt.) is an important ecological and commercial species in the Pacific Northwest. We isolated nine microsatellite loci with variable polymorphism ranging from five to 19 alleles per locus. Observed and expected heterozygosities averaged 0.42 and 0.64 and ranged from 0.11 to 0.83 and from 0.48 to 0.80, respectively. These markers, along with those already existing, will be useful for the species’ gene resource management activities.


G3: Genes, Genomes, Genetics | 2016

Implementation of the Realized Genomic Relationship Matrix to Open-Pollinated White Spruce Family Testing for Disentangling Additive from Nonadditive Genetic Effects

Omnia Gamal El-Dien; Blaise Ratcliffe; Jaroslav Klápště; Ilga Porth; Charles Chen; Yousry A. El-Kassaby

The open-pollinated (OP) family testing combines the simplest known progeny evaluation and quantitative genetics analyses as candidates’ offspring are assumed to represent independent half-sib families. The accuracy of genetic parameter estimates is often questioned as the assumption of “half-sibling” in OP families may often be violated. We compared the pedigree- vs. marker-based genetic models by analysing 22-yr height and 30-yr wood density for 214 white spruce [Picea glauca (Moench) Voss] OP families represented by 1694 individuals growing on one site in Quebec, Canada. Assuming half-sibling, the pedigree-based model was limited to estimating the additive genetic variances which, in turn, were grossly overestimated as they were confounded by very minor dominance and major additive-by-additive epistatic genetic variances. In contrast, the implemented genomic pairwise realized relationship models allowed the disentanglement of additive from all nonadditive factors through genetic variance decomposition. The marker-based models produced more realistic narrow-sense heritability estimates and, for the first time, allowed estimating the dominance and epistatic genetic variances from OP testing. In addition, the genomic models showed better prediction accuracies compared to pedigree models and were able to predict individual breeding values for new individuals from untested families, which was not possible using the pedigree-based model. Clearly, the use of marker-based relationship approach is effective in estimating the quantitative genetic parameters of complex traits even under simple and shallow pedigree structure.


Plant Journal | 2014

Adaptive divergence with gene flow in incipient speciation of Miscanthus floridulus/sinensis complex (Poaceae).

Chao Li Huang; Chuan Wen Ho; Yu-Chung Chiang; Yasumasa Shigemoto; Tsai Wen Hsu; Chi-Chuan Hwang; Xue-Jun Ge; Charles Chen; Tai Han Wu; Chang-Hung Chou; Hao Jen Huang; Takashi Gojobori; Naoki Osada; Tzen Yuh Chiang

Young incipient species provide ideal materials for untangling the process of ecological speciation in the presence of gene flow. The Miscanthus floridulus/sinensis complex exhibits diverse phenotypic and ecological differences despite recent divergence (approximately 1.59 million years ago). To elucidate the process of genetic differentiation during early stages of ecological speciation, we analyzed genomic divergence in the Miscanthus complex using 72 randomly selected genes from a newly assembled transcriptome. In this study, rampant gene flow was detected between species, estimated as M = 3.36 × 10(-9) to 1.20 × 10(-6) , resulting in contradicting phylogenies across loci. Nevertheless, beast analyses revealed the species identity and the effects of extrinsic cohesive forces that counteracted the non-stop introgression. As expected, early in speciation with gene flow, only 3-13 loci were highly diverged; two to five outliers (approximately 2.78-6.94% of the genome) were characterized by strong linkage disequilibrium, and asymmetrically distributed among ecotypes, indicating footprints of diversifying selection. In conclusion, ecological speciation of incipient species of Miscanthus probably followed the parapatric model, whereas allopatric speciation cannot be completely ruled out, especially between the geographically isolated northern and southern M. sinensis, for which no significant gene flow across oceanic barriers was detected. Divergence between local ecotypes in early-stage speciation began at a few genomic regions under the influence of natural selection and divergence hitchhiking that overcame gene flow.


G3: Genes, Genomes, Genetics | 2017

Single-Step BLUP with Varying Genotyping Effort in Open-Pollinated Picea glauca.

Blaise Ratcliffe; Omnia Gamal El-Dien; Eduardo P. Cappa; Ilga Porth; Jaroslav Klapste; Charles Chen; Yousry A. El-Kassaby

Maximization of genetic gain in forest tree breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters. We investigated the effect of the combined use of contemporary pedigree information and genomic relatedness estimates on the accuracy of predicted breeding values and precision of estimated genetic parameters, as well as rankings of selection candidates, using single-step genomic evaluation (HBLUP). In this study, two traits with diverse heritabilities [tree height (HT) and wood density (WD)] were assessed at various levels of family genotyping efforts (0, 25, 50, 75, and 100%) from a population of white spruce (Picea glauca) consisting of 1694 trees from 214 open-pollinated families, representing 43 provenances in Québec, Canada. The results revealed that HBLUP bivariate analysis is effective in reducing the known bias in heritability estimates of open-pollinated populations, as it exposes hidden relatedness, potential pedigree errors, and inbreeding. The addition of genomic information in the analysis considerably improved the accuracy in breeding value estimates by accounting for both Mendelian sampling and historical coancestry that were not captured by the contemporary pedigree alone. Increasing family genotyping efforts were associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy. Yet, improvements were observed even at minimal genotyping effort, indicating that even modest genotyping effort is effective in improving genetic evaluation. The combined utilization of both pedigree and genomic information may be a cost-effective approach to increase the accuracy of breeding values in forest tree breeding programs where shallow pedigrees and large testing populations are the norm.


Molecular Breeding | 2017

Practical application of genomic selection in a doubled-haploid winter wheat breeding program

Jiayin Song; Brett F. Carver; Carol Powers; Liuling Yan; Jaroslav Klápště; Yousry A. El-Kassaby; Charles Chen

Crop improvement is a long-term, expensive institutional endeavor. Genomic selection (GS), which uses single nucleotide polymorphism (SNP) information to estimate genomic breeding values, has proven efficient to increasing genetic gain by accelerating the breeding process in animal breeding programs. As for crop improvement, with few exceptions, GS applicability remains in the evaluation of algorithm performance. In this study, we examined factors related to GS applicability in line development stage for grain yield using a hard red winter wheat (Triticum aestivum L.) doubled-haploid population. The performance of GS was evaluated in two consecutive years to predict grain yield. In general, the semi-parametric reproducing kernel Hilbert space prediction algorithm outperformed parametric genomic best linear unbiased prediction. For both parametric and semi-parametric algorithms, an upward bias in predictability was apparent in within-year cross-validation, suggesting the prerequisite of cross-year validation for a more reliable prediction. Adjusting the training population’s phenotype for genotype by environment effect had a positive impact on GS model’s predictive ability. Possibly due to marker redundancy, a selected subset of SNPs at an absolute pairwise correlation coefficient threshold value of 0.4 produced comparable results and reduced the computational burden of considering the full SNP set. Finally, in the context of an ongoing breeding and selection effort, the present study has provided a measure of confidence based on the deviation of line selection from GS results, supporting the implementation of GS in wheat variety development.


bioRxiv | 2018

SNP Variable Selection by Generalized Graph Domination

Shuzhen Sun; Zhuqi Miao; Blaise Ratcliffe; Polly Campbell; Bret Pasch; Yousry A. El-Kassaby; Balabhaskar Balasundaram; Charles Chen

High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p ≫ n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models. K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum K-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi™ optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).


bioRxiv | 2018

Genomic Surveillance for Antimicrobial Resistance in Mannheimia haemolytica Using Nanopore Single Molecule Sequencing Technology

Alexander Lim; Bryan Naidenov; Haley Bates; Karyn Willyerd; Timothy A. Snider; Matthew Brian Couger; Charles Chen

Disruptive innovations in long-range, cost-effective direct template nucleic acid sequencing are transforming clinical and diagnostic medicine. A multidrug resistant strain and a pan-susceptible strain of Mannheimia haemolytica, isolated from pneumonic bovine lung samples, were respectively sequenced at 146x and 111x coverage with Oxford Nanopore Technologies MinION. De novo assembly produced a complete genome for the non-resistant strain and a nearly complete assembly for the drug resistant strain. Functional annotation using RAST (Rapid Annotations using Subsystems Technology), CARD (Comprehensive Antibiotic Resistance Database) and ResFinder databases identified genes conferring resistance to different classes of antibiotics including beta lactams, tetracyclines, lincosamides, phenicols, aminoglycosides, sulfonamides and macrolides. Antibiotic resistance phenotypes of the M. haemolytica strains were confirmed with minimum inhibitory concentration (MIC) assays. The sequencing capacity of highly portable MinION devices was verified by sub-sampling sequencing reads; potential for antimicrobial resistance determined by identification of resistance genes in the draft assemblies with as little as 5,437 MinION reads corresponded to all classes of MIC assays. The resulting quality assemblies and AMR gene annotation highlight efficiency of ultra long-read, whole-genome sequencing (WGS) as a valuable tool in diagnostic veterinary medicine.

Collaboration


Dive into the Charles Chen's collaboration.

Top Co-Authors

Avatar

Yousry A. El-Kassaby

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Blaise Ratcliffe

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jaroslav Klápště

Czech University of Life Sciences Prague

View shared research outputs
Top Co-Authors

Avatar

Omnia Gamal El-Dien

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomas Funda

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A. Kenawy

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Ahmed M. A. Kenawy

University of British Columbia

View shared research outputs
Researchain Logo
Decentralizing Knowledge