Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ao Yuan is active.

Publication


Featured researches published by Ao Yuan.


Scandinavian Journal of Statistics | 2007

Semiparametric Regression with Kernel Error Model

Ao Yuan; Jan G. De Gooijer

We propose and study a class of regression models, in which the mean function is specified parametrically as in the existing regression methods, but the residual distribution is modeled nonparametrically by a kernel estimator, without imposing any assumption on its distribution. This specification is different from the existing semiparametric regression models. The asymptotic properties of such likelihood and the maximum likelihood estimate (MLE) under this semiparametric model are studied. We show that under some regularity conditions, the MLE under this model is consistent (as compared to the possibly pseudo consistency of the parameter estimation under the existing parametric regression model), and is asymptotically normal with rate sqrt{n} and efficient. The nonparametric pseudo-likelihood ratio has the Wilks property as the true likelihood ratio does. Simulated examples are presented to evaluate the accuracy of the proposed semiparametric MLE method.


Human Genetics | 2006

Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test.

Ao Yuan; Qingqi Yue; Victor Apprey; George E. Bonney

Association studies for complex diseases based on haplotype data have received increasing attention in the last few years. A commonly used nonparametric method, which takes haplotype structure into consideration, is to use the U-statistic to compare the similarities between genetic compositions in the case and control populations. Although the method and its variants are convenient to use in practice, there are some areas where the tests cannot detect even large differences between cases and controls. To overcome this problem and enhance the power, we propose a new form of the weighted U-statistic, which directly compares the dissimilarity between the haplotype structures in the case and control populations. We show that this test statistic is asymptotically a linear combination of the absolute values of normal random variables under the null hypothesis, and shifts strictly toward the right under the alternative, and therefore has no blind areas of detection. Simulation studies indicate that our test statistic overcomes the weakness of the existing ones and is robust and powerful as well.


Annals of Human Genetics | 2012

Bayes Factor Based on the Trend Test Incorporating Hardy–Weinberg Disequilibrium: More Power to Detect Genetic Association

Jinfeng Xu; Ao Yuan; Gang Zheng

In the analysis of case‐control genetic association, the trend test and Pearson’s test are the two most commonly used tests. In genome‐wide association studies (GWAS), Bayes factor (BF) is a useful tool to support significant P‐values, and a better measure than P‐value when results are compared across studies with different sample sizes. When reporting the P‐value of the trend test, we propose a BF directly based on the trend test. To improve the power to detect association under recessive or dominant genetic models, we propose a BF based on the trend test and incorporating Hardy–Weinberg disequilibrium in cases. When the true model is unknown, or both the trend test and Pearson’s test or other robust tests are applied in genome‐wide scans, we propose a joint BF, combining the previous two BFs. All three BFs studied in this paper have closed forms and are easy to compute without integrations, so they can be reported along with P‐values, especially in GWAS. We discuss how to use each of them and how to specify priors. Simulation studies and applications to three GWAS are provided to illustrate their usefulness to detect nonadditive gene susceptibility in practice.


Bioinformatics and Biology Insights | 2008

Gene copy number analysis for family data using semiparametric copula model.

Ao Yuan; Guanjie Chen; Zhong-Cheng Zhou; George E. Bonney; Charles N. Rotimi

Gene copy number changes are common characteristics of many genetic disorders. A new technology, array comparative genomic hybridization (a-CGH), is widely used today to screen for gains and losses in cancers and other genetic diseases with high resolution at the genome level or for specific chromosomal region. Statistical methods for analyzing such a-CGH data have been developed. However, most of the existing methods are for unrelated individual data and the results from them provide explanation for horizontal variations in copy number changes. It is potentially meaningful to develop a statistical method that will allow for the analysis of family data to investigate the vertical kinship effects as well. Here we consider a semiparametric model based on clustering method in which the marginal distributions are estimated nonparametrically, and the familial dependence structure is modeled by copula. The model is illustrated and evaluated using simulated data. Our results show that the proposed method is more robust than the commonly used multivariate normal model. Finally, we demonstrated the utility of our method using a real dataset.


Bioinformatics and Biology Insights | 2012

Simultaneous Analysis of Common and Rare Variants in Complex Traits: Application to SNPs (SCARVAsnp)

Guanjie Chen; Ao Yuan; Yanxun Zhou; Amy R. Bentley; Jie Zhou; Weiping Chen; Daniel Shriner; Adebowale Adeyemo; Charles N. Rotimi

Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA 1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies, 1 a likelihood ratio ≥ 1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.


BMC Genetics | 2005

Genome scan linkage analysis comparing microsatellites and single-nucleotide polymorphisms markers for two measures of alcoholism in chromosomes 1, 4, and 7

Guanjie Chen; Adebowale Adeyemo; Jie Zhou; Ao Yuan; Yuanxiu Chen; Charles N. Rotimi

BackgroundWe analyzed 143 pedigrees (364 nuclear families) in the Collaborative Study on the Genetics of Alcoholism (COGA) data provided to the participants in the Genetic Analysis Workshop 14 (GAW14) with the goal of comparing results obtained from genome linkage analysis using microsatellite and with results obtained using SNP markers for two measures of alcoholism (maximum number of drinks -MAXDRINK and an electrophysiological measure from EEG -TTTH1). First, we constructed haplotype blocks by using the entire set of single-nucleotide polymorphisms (SNP) in chromosomes 1, 4, and 7. These chromosomes have shown linkage signals for MAXDRINK or EEG-TTTH1 in previous reports. Second, we randomly selected one, two, three, four, and five SNPs from each block (referred to as Rep1 – Rep5, respectively) to conduct linkage analysis using variance component approach. Finally, results of all SNP analyses were compared with those obtained using microsatellite markers.ResultsThe LOD scores obtained from SNPs were slightly higher but the curves were not radically different from those obtained from microsatellite analyses. The peaks of linkage regions from SNP sets were slightly shifted to the left when compared to those from microsatellite markers. The reduced sets of SNPs provide signals in the same linkage regions but with a smaller LOD score suggesting a significant impact of the decrease in information content on linkage results. The widths of 1 LOD support interval of linkage regions from SNP sets were smaller when compared to those of microsatellite markers. However, two linkage regions obtained from the microsatellite linkage analysis on chromosome 7 for LOG of TTTH1 were not detected in the SNP based analyses.ConclusionThe linkage results from SNPs showed narrower linkage regions and slightly higher LOD scores when compared to those of microsatellite markers. The different builds of the genetic maps used in microsatellite and SNPs markers or/and errors in genotyping may account for the microsatellite linkage signals on chromosome 7 that were not identified using SNPs. Also, unresolved map issues between SNPs and microsatellite markers may be partly responsible for the shifted linkage peaks when comparing the two types of markers.


Journal of Multivariate Analysis | 2012

U-statistic with side information

Ao Yuan; Wenqing He; Binhuan Wang; Gengsheng Qin

In this paper we study U-statistics with side information incorporated using the method of empirical likelihood. Some basic properties of the proposed statistics are investigated. We find that by implementing the side information properly, the proposed U-statistics can have smaller asymptotic variance than the existing U-statistics in the literature. The proposed U-statistics can achieve asymptotic efficiency in a formal sense and their weak limits admit a convolution result. We also find that the corresponding U-likelihood ratio procedure, as well as the U-empirical likelihood based confidence interval construction, do not benefit from incorporating side information, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The impact of incorrect side information implementation in the proposed U-statistics is also explored. Simulation studies are conducted to assess the finite sample performance of the proposed method. The numerical results show that with side information implemented, the deduction of asymptotic variance can be substantial in some cases, and the coverage probability of the confidence interval using the U-empirical likelihood ratio based method outperforms that of the normal approximation based method, in particular in the cases when the underlying distribution is skewed.


Bioinformatics and Biology Insights | 2012

A novel Approach for the simultaneous Analysis of common and Rare Variants in complex Traits

Ao Yuan; Guanjie Chen; Yanxun Zhou; Amy R. Bentley; Charles N. Rotimi

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.


Computational Statistics & Data Analysis | 2011

Some exact tests for manifest properties of latent trait models

Jan G. De Gooijer; Ao Yuan

Item response theory is one of the modern test theories with applications in educational and psychological testing. Recent developments made it possible to characterize some desired properties in terms of a collection of manifest ones, so that hypothesis tests on these traits can, in principle, be performed. But the existing test methodology is based on asymptotic approximation, which is impractical in most applications since the required sample sizes are often unrealistically huge. To overcome this problem, a class of tests is proposed for making exact statistical inference about four manifest properties: covariances given the sum are non-positive (CSN), manifest monotonicity (MM), conditional association (CA), and vanishing conditional dependence (VCD). One major advantage is that these exact tests do not require large sample sizes. As a result, tests for CSN and MM can be routinely performed in empirical studies. For testing CA and VCD, the exact methods are still impractical in most applications, due to the unusually large number of parameters to be tested. However, exact methods are still derived for them as an exploration toward practicality. Some numerical examples with applications of the exact tests for CSN and MM are provided.


Communications in Statistics-theory and Methods | 2003

On adaptive transformation-retransformation estimate of conditional spatial median

Ali Gannoun; Jérôme Saracco; Ao Yuan; George E. Bonney

Abstract An affine equivariant modification of the conditional spatial median is proposed and studied. This modification used an adaptive transformation–retransformation procedure based on a data-driven coordinate system. This new estimate of multivariate conditional median improves upon the performance of nonequivariant spatial median especially when there are correlation among the real valued components of the vector of interest as well as when the scales of those components are different. The proposed approach is based on minimizing a loss function equivalent to that in univariate case. We indicate how to compute the proposed estimate and study its asymptotic properties. We also suggest an adaptive procedure to select the optimal data-driven coordinate system. We discuss the performance of our estimator with the help of a finite sample simulation study and illustrate our methodology by a data-set on blood pressure measurements.

Collaboration


Dive into the Ao Yuan's collaboration.

Top Co-Authors

Avatar

Guanjie Chen

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Gang Zheng

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Charles N. Rotimi

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qizhai Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ming Tan

Georgetown University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amy R. Bentley

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jie Zhou

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge