Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xia Shen is active.

Publication


Featured researches published by Xia Shen.


PLOS Genetics | 2012

Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana.

Xia Shen; Mats E. Pettersson; Lars Rönnegård; Örjan Carlborg

The phenotypic effect of a gene is normally described by the mean-difference between alternative genotypes. A gene may, however, also influence the phenotype by causing a difference in variance between genotypes. Here, we reanalyze a publicly available Arabidopsis thaliana dataset [1] and show that genetic variance heterogeneity appears to be as common as normal additive effects on a genomewide scale. The study also develops theory to estimate the contributions of variance differences between genotypes to the phenotypic variance, and this is used to show that individual loci can explain more than 20% of the phenotypic variance. Two well-studied systems, cellular control of molybdenum level by the ion-transporter MOT1 and flowering-time regulation by the FRI-FLC expression network, and a novel association for Leaf serration are used to illustrate the contribution of major individual loci, expression pathways, and gene-by-environment interactions to the genetic variance heterogeneity.


Genetics | 2013

A Novel Generalized Ridge Regression Method for Quantitative Genetics

Xia Shen; Moudud Alam; Freddy Fikse; Lars Rönnegård

As the molecular marker density grows, there is a strong need in both genome-wide association studies and genomic selection to fit models with a large number of parameters. Here we present a computationally efficient generalized ridge regression (RR) algorithm for situations in which the number of parameters largely exceeds the number of observations. The computationally demanding parts of the method depend mainly on the number of observations and not the number of parameters. The algorithm was implemented in the R package bigRR based on the previously developed package hglm. Using such an approach, a heteroscedastic effects model (HEM) was also developed, implemented, and tested. The efficiency for different data sizes were evaluated via simulation. The method was tested for a bacteria-hypersensitive trait in a publicly available Arabidopsis data set including 84 inbred lines and 216,130 SNPs. The computation of all the SNP effects required <10 sec using a single 2.7-GHz core. The advantage in run time makes permutation test feasible for such a whole-genome model, so that a genome-wide significance threshold can be obtained. HEM was found to be more robust than ordinary RR (a.k.a. SNP-best linear unbiased prediction) in terms of QTL mapping, because SNP-specific shrinkage was applied instead of a common shrinkage. The proposed algorithm was also assessed for genomic evaluation and was shown to give better predictions than ordinary RR.


BMC Genomics | 2013

Genetic dissection of growth traits in a Chinese indigenous × commercial broiler chicken cross

Zheya Sheng; Mats E. Pettersson; Xiaoxiang Hu; Chenglong Luo; Hao Qu; Dingming Shu; Xia Shen; Örjan Carlborg; Ning Li

BackgroundIn China, consumers often prefer indigenous broiler chickens over commercial breeds, as they have characteristic meat qualities requested within traditional culinary customs. However, the growth-rate of these indigenous breeds is slower than that of the commercial broilers, which means they have not yet reached their full economic value. Therefore, combining the valuable meat quality of the native chickens with the efficiency of the commercial broilers is of interest. In this study, we generated an F2 intercross between the slow growing native broiler breed, Huiyang Beard chicken, and the fast growing commercial broiler breed, High Quality chicken Line A, and used it to map loci explaining the difference in growth rate between these breeds.ResultsA genome scan to identify main-effect loci affecting 24 growth-related traits revealed nine distinct QTL on six chromosomes. Many QTL were pleiotropic and conformed to the correlation patterns observed between phenotypes. Most of the mapped QTL were found in locations where growth QTL have been reported in other populations, although the effects were greater in this population. A genome scan for pairs of interacting loci identified a number of additional QTL in 10 other genomic regions. The epistatic pairs explained 6–8% of the residual phenotypic variance. Seven of the 10 epistatic QTL mapped in regions containing candidate genes in the ubiquitin mediated proteolysis pathway, suggesting the importance of this pathway in the regulation of growth in this chicken population.ConclusionsThe main-effect QTL detected using a standard one-dimensional genome scan accounted for a significant fraction of the observed phenotypic variance in this population. Furthermore, genes in known pathways present interesting candidates for further exploration. This study has thus located several QTL regions as promising candidates for further study, which will increase our understanding of the genetic mechanisms underlying growth-related traits in chickens.


Heredity | 2015

Identification of quantitative genetic components of fitness variation in farmed, hybrid and native salmon in the wild

Francois Besnier; Kevin A. Glover; Sigbjørn Lien; Matthew Kent; Michael M. Hansen; Xia Shen; Øystein Skaala

Feral animals represent an important problem in many ecosystems due to interbreeding with wild conspecifics. Hybrid offspring from wild and domestic parents are often less adapted to local environment and ultimately, can reduce the fitness of the native population. This problem is an important concern in Norway, where each year, hundreds of thousands of farm Atlantic salmon escape from fish farms. Feral fish outnumber wild populations, leading to a possible loss of local adaptive genetic variation and erosion of genetic structure in wild populations. Studying the genetic factors underlying relative performance between wild and domesticated conspecific can help to better understand how domestication modifies the genetic background of populations, and how it may alter their ability to adapt to the natural environment. Here, based upon a large-scale release of wild, farm and wild x farm salmon crosses into a natural river system, a genome-wide quantitative trait locus (QTL) scan was performed on the offspring of 50 full-sib families, for traits related to fitness (length, weight, condition factor and survival). Six QTLs were detected as significant contributors to the phenotypic variation of the first three traits, explaining collectively between 9.8 and 14.8% of the phenotypic variation. The seventh QTL had a significant contribution to the variation in survival, and is regarded as a key factor to understand the fitness variability observed among salmon in the river. Interestingly, strong allelic correlation within one of the QTL regions in farmed salmon might reflect a recent selective sweep due to artificial selection.


Frontiers in Genetics | 2013

Beware of risk for increased false positive rates in genome-wide association studies for phenotypic variability.

Xia Shen; Örjan Carlborg

Performing genome-wide association studies (GWAS) to identify genes regulating the between-genotype variability, rather than the mean, is a new promising approach for dissecting the genetics of complex traits. Using this strategy, Yang et al. (2012) successfully identified and replicated the FTO locus and showed that it has a role in regulating the between-genotype variance heterogeneity of human body mass index using a parametric regression model. This finding illustrates the potential clinical contribution of this type of inheritance and that it is not only a feature of model organisms (e.g., Queitsch et al., 2002; Sangster et al., 2008; Gangaraju et al., 2011; Jimenez-Gomez et al., 2011; Christine et al., 2012; Shen et al., 2012). As it is likely that this paper will increase the interest for applying this methodology in other human and experimental populations, we think that it is important to make prospective users aware that one need to be careful when applying similar methodology to smaller datasets than those used by Yang et al. Yang et al. (2012) noticed that the mapping of variance-controlling loci is prone to inflated test statistics when the minor allele frequency (MAF) is small, but provided no further explanation for this. Here, we will briefly explain why such observation is only half true and why GWAS analyses to detect variance heterogeneity is inherently sensitive to unbalanced data, and why researchers aiming to perform similar analyses need to be careful to avoid reporting false positive signals. The basis for the sensitivity of variance-heterogeneity GWAS analyses is that the commonly applied statistical tests for variance heterogeneity, including e.g., regression using the squared Z-score, the Levene test (Levene, 1960) and the Brown–Forsythe test (Brown and Forsythe, 1974), are biased when applied to imbalanced samples. The major reason for this is that the distribution of the variance often deviates from normality as it: (1) is bounded at zero; (2) has a distribution skewed to the right; (3) has a variance depending on its mean. Such deviations leads to violations of, e.g., the Gauss–Markov assumptions in a regression model (Plackett, 1950), which could cause problems such as those highlighted here. This bias is usually not discussed in the standard statistics literature as it appears only when the samples are severely imbalanced and is not sufficiently strong to be of importance when the tests are used in situations without excessive multiple-testing. GWAS analyses, however, goes well beyond normal statistical theory by doing hundreds of thousands to millions of tests in severely imbalanced samples. As we will show below, these situations could lead to problems with type I errors, even when stringent Bonferroni-corrected thresholds are used, unless caution is taken in the design of the study and in the quality control of the results. To illustrate this inherent problem in the statistical methodology used to test for variance heterogeneity, we used simple simulations in two populations: one with two genotypes: AA and BB and one with three genotypes: AA, AB, and BB. In the simulations, the number of individuals in the minor genotype class (NMG) was varied in populations of increasing sizes. Phenotypes were simulated as pure noise from a standard normal distribution, i.e., all significant signals are false-positives as no genetic effect was simulated. We performed 1,000,000 tests for a variance difference for each combination of population-size and NMG. The number of tests that exceeded the Bonferroni-corrected significance threshold for 1,000,000 independent tests was counted to provide an estimate of the expected number of false positive signals in a genome-scan. As shown in Figure ​Figure1A,1A, when there are only two genotype classes, the type I error rate can be very large if the NMG contains fewer than 100 observations when using regression on the squared Z-score, and this cannot be overcome by increasing the total sample-size. The Levene and Brown–Forsythe tests also show such an inflation of false positives (Figure ​(Figure1B),1B), but use of a Gamma regression model, which accounts for the fact that the squared Z-score follows a chi-square distribution, overcomes this problem. Populations with three genotypes will, in practice, be more robust when the allele substitution model implemented in most GWAS-software is used (i.e., when regression on all three genotypes is used to estimate the additive effect). Inflated type I error rates are then observed only when the intermediate-size genotype class (i.e., in practice most often the heterozygotes) contains fewer than 100 individuals (Figures 1C–E). It should be noted, however, that if the additive genetic effect is estimated as a contrast between the homozygotes (ignoring heterozygotes) or if the dominance effect is included in the model, the bias will be determined by NMG in the same way as when only two genotype classes are present in the population. In our simulations, false signals appear only when the number of observations is lower in the high-variance class (not shown). When the low-variance class has fewer observations, the test is underpowered, which is a likely reason for the lack of false positives. This asymmetry in power has earlier been discussed by Shen et al. (2012). Figure 1 –log10P-value distribution for different scenarios of GWAS for phenotypic variability. Different sample sizes (n) and numbers of individuals in the minor genotype class (NMG = nfAA) were simulated. 1,000,000 replicates for each combination of ... In practice this means that researchers aiming to perform a GWAS for detection of genes affecting the between-genotype variance difference need to be aware that they may take a considerable risk of obtaining excessive numbers of false positives when the allele-frequencies differ and the NMG is associated with the high-variance estimate. This applies even when stringent multiple-testing corrections are used. We therefore advise that results should be interpreted with caution when (i) the genetic effect in the model is a contrast between two genotype classes and there are less than 100 observations in the minor genotype class, or (ii) the genetic effect in the model is estimated using observations from three genotype classes and there are less than 100 observations in the intermediate-size genotype class. In such situations, a Gamma generalized linear models (GLM) should be applied to further examine the results.


Frontiers in Genetics | 2013

PASE: a novel method for functional prediction of amino acid substitutions based on physicochemical properties.

Xidan Li; Marcin Kierczak; Xia Shen; Muhammad Ahsan; Örjan Carlborg; Stefan Marklund

Background: Non-synonymous single-nucleotide polymorphisms (nsSNPs) within the coding regions of genes causing amino acid substitutions (AASs) may have a large impact on protein function. The possibilities to identify nsSNPs across genomes have increased notably with the advent of next-generation sequencing technologies. Thus, there is a strong need for efficient bioinformatics tools to predict the functional effect of AASs. Such tools can be used to identify the most promising candidate mutations for further experimental validation. Results: Here we present prediction of AAS effects (PASE), a novel method that predicts the effect of an AASs based on physicochemical property changes. Evaluation of PASE, using a few AASs of known phenotypic effects and 3338 human AASs, for which functional effects have previously been scored with the widely used SIFT and PolyPhen tools, show that PASE is a useful method for functional prediction of AASs. We also show that the predictions can be further improved by combining PASE with information about evolutionary conservation. Conclusion: PASE is a novel algorithm for predicting functional effects of AASs, which can be used for pinpointing the most interesting candidate mutations. PASE predictions are based on changes in seven physicochemical properties and can improve predictions from many other available tools, which are based on evolutionary conservation. Using available experimental data and predictions from the already existing tools, we demonstrate that PASE is a useful method for predicting functional effects of AASs, even when a limited number of query sequence homologs/orthologs are available.


BMC Research Notes | 2011

qtl.outbred: Interfacing outbred line cross data with the R/qtl mapping software.

Ronald M. Nelson; Xia Shen; Örjan Carlborg

Backgroundqtl.outbred is an extendible interface in the statistical environment, R, for combining quantitative trait loci (QTL) mapping tools. It is built as an umbrella package that enables outbred genotype probabilities to be calculated and/or imported into the software package R/qtl.FindingsUsing qtl.outbred, the genotype probabilities from outbred line cross data can be calculated by interfacing with a new and efficient algorithm developed for analyzing arbitrarily large datasets (included in the package) or imported from other sources such as the web-based tool, GridQTL.Conclusionqtl.outbred will improve the speed for calculating probabilities and the ability to analyse large future datasets. This package enables the user to analyse outbred line cross data accurately, but with similar effort than inbred line cross data.


G3: Genes, Genomes, Genetics | 2013

MAPfastR : quantitative trait loci mapping in outbred line crosses

Ronald M. Nelson; Carl Nettelblad; Mats E. Pettersson; Xia Shen; Lucy Crooks; Francois Besnier; José M. Álvarez-Castro; Lars Rönnegård; Weronica Ek; Zheya Sheng; Marcin Kierczak; Sverker Holmgren; Örjan Carlborg

MAPfastR is a software package developed to analyze quantitative trait loci data from inbred and outbred line-crosses. The package includes a number of modules for fast and accurate quantitative trait loci analyses. It has been developed in the R language for fast and comprehensive analyses of large datasets. MAPfastR is freely available at: http://www.computationalgenetics.se/?page_id=7


F1000Research | 2013

Issues with data transformation in genome-wide association studies for phenotypic variability.

Xia Shen; Lars Rönnegård

The purpose of this correspondence is to discuss and clarify a few points about data transformation used in genome-wide association studies, especially for phenotypic variability. By commenting on the recent publication by Sun et al. in the American Journal of Human Genetics, we emphasize the importance of statistical power in detecting functional loci and the real meaning of the scale of the phenotype in practice.


bioRxiv | 2014

Highly epistatic genetic architecture of root length in Arabidopsis thaliana

Jennifer Lachowiec; Xia Shen; Christine Queitsch; Örjan Carlborg

Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. This is despite the fact that non-additive genetic effects, such as epistatic interactions and developmental noise, are also likely to make important contributions to the phenotypic variability. Analyses beyond additivity require additional care in the design and collection of data, and introduce significant analytical and computational challenges in the statistical analyses. Here, we have conducted a study that, by focusing on a model complex trait that allows precise phenotyping across many replicates and by applying advanced analytical tools capable of capturing epistatic interactions, overcome these challenges. Specifically, we examined the genetic determinants of Arabidopsis thaliana root length, considering both trait mean and variance as phenotypes. Estimation of the narrow- and broad-sense heritabilities of mean root length found that only the non-additive variance was significantly different from zero. Also, no loci were found to contribute to mean root length using a standard additive model based genome-wide association analysis (GWAS). We could, however, identify one locus regulating developmental noise (root length variance) and seven loci contributing to root-length mean through epistatic interactions. Four of the epistatic loci were also experimentally confirmed. The locus associated with root length variance contains a candidate gene that, when mutated, appears to decrease developmental noise. This is particularly interesting as most other known noise regulators in multicellular organisms increase noise when mutated. A mutant analysis of candidate genes within the seven epistatic loci identified four genes that affected root development, including three without previously described root phenotypes. In summary, we identify several novel genes affecting root development, demonstrate the benefits of advanced analytical tools to study the genetic determinants of complex traits, and show that epistatic interactions can be a major determinant of complex traits in natural A. thaliana populations.Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. Here, we examined the genetics of Arabidopsis thaliana root length and found that the narrow-sense heritability for this trait was statistically zero. This low additive genetic variance likely explains why no associations to root length could be found using standard additive-model-based genome-wide association (GWA) approaches. However, the broad-sense heritability for root length was significantly larger, and we therefore also performed an epistatic GWA analysis to map loci contributing to the epistatic genetic variance. This analysis revealed four interacting pairs involving seven chromosomal loci that passed a standard multiple-testing corrected significance threshold. Explorations of the genotype-phenotype maps for these pairs revealed that the detected epistasis cancelled out the additive genetic variance, explaining why these loci were not detected in the additive GWA analysis. Small population sizes, such as in our experiment, increase the risk of identifying false epistatic interactions due to testing for associations with very large numbers of multi-marker genotypes in few phenotyped individuals. Therefore, we estimated the false-positive risk using a new statistical approach that suggested half of the associated pairs to be true positive associations. Our experimental evaluation of candidate genes within the seven associated loci suggests that this estimate is conservative; we identified functional candidate genes that affected root development in four loci that were part of three of the pairs. In summary, statistical epistatic analyses were found to be indispensable for confirming known, and identifying several new, functional candidate genes for root length using a population of wild-collected A. thaliana accessions. We also illustrated how epistatic cancellation of the additive genetic variance resulted in an insignificant narrow-sense, but significant broad-sense heritability that could be dissected into the contributions of several individual loci using a combination of careful statistical epistatic analyses and functional genetic experiments. Author summary Complex traits, such as many human diseases or climate adaptation and production traits in crops, arise through the action and interaction of many genes and environmental factors. Classic approaches to identify contributing genes generally assume that these factors contribute mainly additive genetic variance. Recent methods, such as genome-wide association studies, often adhere to this additive genetics paradigm. However, additive models of complex traits do not reflect that genes can also contribute with non-additive genetic variance. In this study, we use Arabidopsis thaliana to determine the additive and non-additive genetic contributions to the phenotypic variation in root length. Surprisingly, much of the observed phenotypic variation in root length across genetically divergent strains was explained by epistasis. We mapped seven loci contributing to the epistatic genetic variance and validated four genes in these loci with mutant analysis. For three of these genes, this is their first implication in root development. Together, our results emphasize the importance of considering both non-additive and additive genetic variance when dissecting complex trait variation, in order not to lose sensitivity in genetic analyses.

Collaboration


Dive into the Xia Shen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mats E. Pettersson

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Francois Besnier

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar

Freddy Fikse

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar

Marcin Kierczak

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ronald M. Nelson

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge