Jingyuan Liu
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jingyuan Liu.
Journal of the American Statistical Association | 2014
Jingyuan Liu; Runze Li; Rongling Wu
This article is concerned with feature screening and variable selection for varying coefficient models with ultrahigh-dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency. To enhance the finite sample performance of the proposed procedure, we further develop an iterative feature screening procedure. Monte Carlo simulation studies were conducted to examine the performance of the proposed procedures. In practice, we advocate a two-stage approach for varying coefficient models. The two-stage approach consists of (a) reducing the ultrahigh dimensionality by using the proposed procedure and (b) applying regularization methods for dimension-reduced varying coefficient models to make statistical inferences on the coefficient functions. We illustrate the proposed two-stage approach by a real data example. Supplementary materials for this article are available online.
Briefings in Bioinformatics | 2015
Libo Jiang; Jingyuan Liu; Xuli Zhu; Meixia Ye; Lidan Sun; Xavier Lacaze; Rongling Wu
Whole-genome search of genes is an essential approach to dissecting complex traits, but a marginal one-single-nucleotide polymorphism (SNP)/one-phenotype regression analysis widely used in current genome-wide association studies fails to estimate the net and cumulative effects of SNPs and reveal the developmental pattern of interplay between genes and traits. Here we describe a computational framework, which we refer to as two-side high-dimensional genome-wide association studies (2HiGWAS), to associate an ultrahigh dimension of SNPs with a high dimension of developmental trajectories measured across time and space. The model is implemented with a dual dimension-reduction procedure for both predictors and responses to select a sparse but full set of significant loci from an extremely large pool of SNPs and estimate their net time-varying effects on trait development. The model can not only help geneticists to precisely identify an entire set of genes underlying complex traits but also allow them to elucidate a global picture of how genes control developmental and dynamic processes of trait formation. We investigated the statistical properties of the model via extensive simulation studies. With the increasing availability of GWAS in various organisms, 2HiGWAS will have important implications for genetic studies of developmental compelx traits.
Frontiers in Genetics | 2012
Jingyuan Liu; Zhong Wang; Yaqun Wang; Runze Li; Rongling Wu
The multilocus analysis of polymorphisms has emerged as a vital ingredient of population genetics and evolutionary biology. A fundamental assumption used for existing multilocus analysis approaches is Hardy–Weinberg equilibrium at which maternally- and paternally-derived gametes unite randomly during fertilization. Given the fact that natural populations are rarely panmictic, these approaches will have a significant limitation for practical use. We present a robust model for multilocus linkage disequilibrium analysis which does not rely on the assumption of random mating. This new disequilibrium model capitalizes on Weir’s definition of zygotic disequilibria and is based on an open-pollinated design in which multiple maternal individuals and their half-sib families are sampled from a natural population. This design captures two levels of associations: one is at the upper level that describes the pattern of cosegregation between different loci in the parental population and the other is at the lower level that specifies the extent of co-transmission of non-alleles at different loci from parents to their offspring. An MCMC method was implemented to estimate genetic parameters that define these associations. Simulation studies were used to validate the statistical behavior of the new model.
Methods of Molecular Biology | 2012
Jiahan Li; Kiranmoy Das; Jingyuan Liu; Guifang Fu; Yao Li; Christian M. Tobias; Rongling Wu
Statistical methods for genetic mapping have well been developed for diploid species but are lagging in the more complex polyploids. The genetic mapping of polyploids, where genome number is higher than two, is complicated by uncertainty about the genotype-phenotype correspondence, inconsistent meiotic mechanisms, heterozygous genome structures, and increased allelic (action) and nonallelic (interaction) combinations. According to their meiotic configurations, polyploids can be classified as bivalent polyploids, in which only two chromosomes pair during meiosis at a time, and multivalent polyploids, where multiple chromosomes pair simultaneously. For some polyploids, these two types of pairing occur at the same time, leading to a mixed category. This chapter reviews several challenges due to the complexities of linkage analysis in polyploids and describes statistical models and algorithms that have been developed for linkage mapping based on their distinct meiotic characteristics. We discuss several issues that should be addressed to better understand the genome structure and organization of polyploids and the genetic architecture of complex traits for this unique group of plants.
Frontiers in Genetics | 2012
Zhong Wang; Jingyuan Liu; Jianxin Wang; Yaqun Wang; Ningtao Wang; Yao Li; Runze Li; Rongling Wu
The growing evidence that cancer originates from stem cells (SC) holds a great promise to eliminate this disease by designing specific drug therapies for removing cancer SC. Translation of this knowledge into predictive tests for the clinic is hampered due to the lack of methods to discriminate cancer SC from non-cancer SC. Here, we address this issue by describing a conceptual strategy for identifying the genetic origins of cancer SC. The strategy incorporates a high-dimensional group of differential equations that characterizes the proliferation, differentiation, and reprogramming of cancer SC in a dynamic cellular and molecular system. The deployment of robust mathematical models will help uncover and explain many still unknown aspects of cell behavior, tissue function, and network organization related to the formation and division of cancer SC. The statistical method developed allows biologically meaningful hypotheses about the genetic control mechanisms of carcinogenesis and metastasis to be tested in a quantitative manner.
Neurocomputing | 2016
Jingyuan Liu
This paper is concerned with longitudinal partially linear models (LPLM) with ultrahigh-dimensional covariates and predictors. As flexible extension of linear regression models by allowing nonparametric intercept function to capture the overall trend over time, the LPLM are expected to be highly potential statistical models for analyzing high-dimensional longitudinal data such as longitudinal genetic data and functional magnetic resonance image data. Feature screening and variable selection are indispensable for LPLM in the presence of ultrahigh-dimensional covariates such as genetic markers and all pixels in image data. This paper proposes a two-stage variable selection procedure that consists of a quick screening stage and a post-screening refining stage, for the ultrahigh dimensional longitudinal partially linear models. The proposed approach is based on the partial residual method for dealing with the nonparametric baseline function. We establish the sure screening property of the proposed screening procedure in the first stage. Simulation results demonstrate the validity of this two-stage method. We further demonstrate the proposed methodology by an empirical analysis of a real data set collected in a soybean plant longitudinal genetic study.
Statistica Sinica | 2018
Jingyuan Liu; Runze Li; Lejia Lou
Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.
Science China-mathematics | 2017
LuHeng Wang; Jingyuan Liu; Yong Li; Runze Li
Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.
Quality and Reliability Engineering International | 2016
Liangxing Shi; Qiumeng He; Jingyuan Liu; Zhen He
Measurement system capability analysis is to determine whether the measurement system is capable for use in quality control. The existing research has been extended from univariate to multivariate cases. Two approaches, the multivariate analysis of variance (MANOVA) and the weighted principal components (WPC), were advocated in literature. The MANOVA method is constructed based on the volume ratio that treats the volume of constant-density contours as the variability estimations. However, it ignores the fact that the relative position change of multivariate measurement errors could affect the measurement system capability. The WPC method uses dimension reduction to reduce the complexity but is unable to build the precision-to-tolerance ratio because it does not include tolerance. In this paper, we propose a modified-region-based method to compute the precision-to-tolerance ratio, the percent of repeatability and reproducibility, and the signal-to-noise ratio. This method also incorporates the variance–covariance structure of the measurement errors when dealing with the constant-density contours of tolerances, total variation, and process variation. The performance of the modified-region-based method is evaluated based on a dataset from the literature and a set of relevant simulation. The proposed method proves to be effective compared with other methods.Copyright
BMC Genetics | 2012
Wei Hou; Yihan Sui; Zhong Wang; Yaqun Wang; Ningtao Wang; Jingyuan Liu; Yao Li; Maureen M. Goodenow; Li Yin; Zuoheng Wang; Rongling Wu
Mathematical models of viral dynamics in vivo provide incredible insights into the mechanisms for the nonlinear interaction between virus and host cell populations, the dynamics of viral drug resistance, and the way to eliminate virus infection from individual patients by drug treatment. The integration of these mathematical models with high-throughput genetic and genomic data within a statistical framework will raise a hope for effective treatment of infections with HIV virus through developing potent antiviral drugs based on individual patients’ genetic makeup. In this opinion article, we will show a conceptual model for mapping and dictating a comprehensive picture of genetic control mechanisms for viral dynamics through incorporating a group of differential equations that quantify the emergent properties of a system.