Jiahan Li
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiahan Li.
Human Genetics | 2011
Kiranmoy Das; Jiahan Li; Zhong Wang; Chunfa Tong; Guifang Fu; Yao Li; Meng Xu; Kwangmi Ahn; David T. Mauger; Runze Li; Rongling Wu
Although genome-wide association studies (GWAS) are widely used to identify the genetic and environmental etiology of a trait, several key issues related to their statistical power and biological relevance have remained unexplored. Here, we describe a novel statistical approach, called functional GWAS or fGWAS, to analyze the genetic control of traits by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. fGWAS can address many fundamental questions, such as the patterns of genetic control over development, the duration of genetic effects, as well as what causes developmental trajectories to change or stop changing. In statistics, fGWAS displays increased power for gene detection by capitalizing on cumulative phenotypic variation in a longitudinal trait over time and increased robustness for manipulating sparse longitudinal data.
Journal of Biological Dynamics | 2011
Guifang Fu; Jiangtao Luo; Arthur Berg; Zhong Wang; Jiahan Li; Kiranmoy Das; Runze Li; Rongling Wu
Functional mapping is a statistical method for mapping quantitative trait loci (QTLs) that regulate the dynamic pattern of a biological trait. This method integrates mathematical aspects of biological complexity into a mixture model for genetic mapping and tests the genetic effects of QTLs by comparing genotype-specific curve parameters. As a way of quantitatively specifying the dynamic behaviour of a system, differential equations have proved to be powerful for modelling and unravelling the biochemical, molecular, and cellular mechanisms of a biological process, such as biological rhythms. The equipment of functional mapping with biologically meaningful differential equations provides new insights into the genetic control of any dynamic processes. We formulate a new functional mapping framework for a dynamic biological rhythm by incorporating a group of ordinary differential equations (ODE). The Runge–Kutta fourth-order algorithm was implemented to estimate the parameters that define the system of ODE. The new model will find its implications for understanding the interplay between gene interactions and developmental pathways in complex biological rhythms.
The Annals of Applied Statistics | 2015
Jiahan Li; Zhong Wang; Runze Li; Rongling Wu
Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or fGWAS. Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The fGWAS model, equipped with Bayesian group lassso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.
Theoretical Population Biology | 2009
Wei Hou; Tian Liu; Yao Li; Qin Li; Jiahan Li; Kiranmoy Das; Arthur Berg; Rongling Wu
The structure and organization of natural plant populations can be understood by estimating the genetic parameters related to mating behavior, recombination frequency, and gene associations with DNA-based markers typed throughout the genome. We developed a statistical and computational model for estimating and testing these parameters from multilocus data collected in a natural population. This model, constructed by a maximum likelihood approach and implemented within the EM algorithm, is shown to be robust for simultaneously estimating the outcrossing rate, recombination frequencies and linkage disequilibria. The algorithm built with three or more markers allows the characterization of crossover interference in meiosis and high-order disequilibria among different genes, thus providing a powerful tool for illustrating a detailed picture of genetic diversity and organization in natural populations. Computer simulations demonstrate the statistical properties of the proposed model. This multilocus model will be useful for studying the pattern and amount of genetic variation within and among populations to further infer the evolutionary history of a plant species.
The Annals of Applied Statistics | 2014
Jiahan Li; Wei Zhong; Runze Li; Rongling Wu
With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
Briefings in Bioinformatics | 2013
Zhong Wang; Xiaoming Pang; Yafei Lv; Fang Xu; Tao Zhou; Xin Li; Jiahan Li; Zhikang Li; Rongling Wu
Despite its central role in the adaptation and microevolution of traits, the genetic architecture of phenotypic plasticity, i.e. multiple phenotypes produced by a single genotype in changing environments, remains elusive. We know little about the genes that underlie the plastic response of traits to the environment, their number, chromosomal locations and genetic interactions as well as environment impact on their effects. Here we review key statistical approaches for analyzing the genetic variation of phenotypic plasticity due to genotype-environment interactions and describe the implementation of a dynamic model to map specific quantitative trait loci (QTLs) that affect the gradient expression of a quantitative trait across a range of environments. This dynamic model is distinct by incorporating mathematical aspects of phenotypic plasticity into a QTL mapping framework, thereby better unraveling the quantitative attribute of trait response to the environment. By testing the curve parameters that specify environment-dependent trajectories of the trait, the model allows a series of fundamental hypotheses to be tested in a quantitative way about the interplay between gene action/interaction and environmental sensitivity. The model can also make the dynamic prediction of genetic control over phenotypic plasticity within the context of changing environments. We demonstrate the usefulness of the model by reanalyzing a QTL data set for rice, gleaning new insights into the genetic basis for phenotypic plasticity in plant height growth.
Human Heredity | 2011
Kiranmoy Das; Jiahan Li; Guifang Fu; Zhong Wang; Rongling Wu
Objective: Longitudinal measurements with bivariate response have been analyzed by several authors using two separate models for each response. However, for most of the biological or medical experiments, the two responses are highly correlated and hence a separate model for each response might not be a desirable way to analyze such data. A single model considering a bivariate response provides a more powerful inference as the correlation between the responses is modeled appropriately. In this article, we propose a dynamic statistical model to detect the genes controlling human blood pressure (systolic and diastolic). Methods: By modeling the mean function with orthogonal Legendre polynomials and the covariance matrix with a stationary parametric structure, we incorporate the statistical ideas in functional genome-wide association studies to detect SNPs which have significant control on human blood pressure. The traditional false discovery rate is used for multiple comparisons. Results: We analyze the data from the Framingham Heart Study to detect such SNPs by appropriately considering gender-gene interaction. We detect 8 SNPs for males and 7 for females which are most significant in controlling blood pressure. The genotype-specific mean curves and additive and dominant effects over time are shown for each significant SNP for both genders. Simulation studies are performed to examine the statistical properties of our model. The current model will be extremely useful in detecting genes controlling different traits and diseases for humans or non-human subjects.
BMC Plant Biology | 2011
John Stephen Yap; Yao Li; Kiranmoy Das; Jiahan Li; Rongling Wu
BackgroundThe identification of genes or quantitative trait loci that are expressed in response to different environmental factors such as temperature and light, through functional mapping, critically relies on precise modeling of the covariance structure. Previous work used separable parametric covariance structures, such as a Kronecker product of autoregressive one [AR(1)] matrices, that do not account for interaction effects of different environmental factors.ResultsWe implement a more robust nonparametric covariance estimator to model these interactions within the framework of functional mapping of reaction norms to two signals. Our results from Monte Carlo simulations show that this estimator can be useful in modeling interactions that exist between two environmental signals. The interactions are simulated using nonseparable covariance models with spatio-temporal structural forms that mimic interaction effects.ConclusionsThe nonparametric covariance estimator has an advantage over separable parametric covariance estimators in the detection of QTL location, thus extending the breadth of use of functional mapping in practical settings.
International Journal of Plant Genomics | 2010
Jiahan Li; Kiranmoy Das; Guifang Fu; Chunfa Tong; Yao Li; Christian M. Tobias; Rongling Wu
Multivalent tetraploids that include many plant species, such as potato, sugarcane, and rose, are of paramount importance to agricultural production and biological research. Quantitative trait locus (QTL) mapping in multivalent tetraploids is challenged by their unique cytogenetic properties, such as double reduction. We develop a statistical method for mapping multivalent tetraploid QTLs by considering these cytogenetic properties. This method is built in the mixture model-based framework and implemented with the EM algorithm. The method allows the simultaneous estimation of QTL positions, QTL effects, the chromosomal pairing factor, and the degree of double reduction as well as the assessment of the estimation precision of these parameters. We used simulated data to examine the statistical properties of the method and validate its utilization. The new method and its software will provide a useful tool for QTL mapping in multivalent tetraploids that undergo double reduction.
Genetics Research | 2009
Jiahan Li; Qin Li; Wei Hou; Kun Han; Yao Li; Song Wu; Yanchun Li; Rongling Wu
A linkage-linkage disequilibrium map that describes the pattern and extent of linkage dis-equilibrium (LD) decay with genomic distance has now emerged as a viable tool to unravel the genetic structure of population differentiation and fine-map genes for complex traits. The prerequisite for constructing such a map is the simultaneous estimation of the linkage and LD between different loci. Here, we develop a computational algorithm for simultaneously estimating the recombination fraction and LD in a natural outcrossing population with multilocus marker data, which are often estimated separately in most molecular genetic studies. The algorithm is founded on a commonly used progeny test with open-pollinated offspring sampled from a natural population. The information about LD is reflected in the co-segregation of alleles at different loci among parents in the population. Open mating of parents will reveal the genetic linkage of alleles during meiosis. The algorithm was constructed within the polynomial-based mixture framework and implemented with the Expectation-Maximization (EM) algorithm. The by-product of the derivation of this algorithm is the estimation of outcrossing rate, a parameter useful to explore the genetic diversity of the population. We performed computer simulation to investigate the influences of different sampling strategies and different values of parameters on parameter estimation. By providing a number of testable hypotheses about population genetic parameters, this algorithmic model will open a broad gateway to understand the genetic structure and dynamics of an outcrossing population under natural selection.