Ziheng Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ziheng Yang is active.

Explore More

Publication

Featured researches published by Ziheng Yang.

Bioinformatics | 1997

PAML: a program package for phylogenetic analysis by maximum likelihood

Ziheng Yang

PAML, currently in version 1.2, is a package of programs for phylogenetic analyses of DNA and protein sequences using the method of maximum likelihood (ML). The programs can be used for (i) maximum likelihood estimation of evolutionary parameters such as branch lengths in a phylogenetic tree, the transition/transversion rate ratio, the shape parameter of the gamma distribution for variable evolutionary rates at sites, and rate parameters for different genes; (ii) likelihood ratio test of hypotheses concerning sequence evolution, such as rate constancy and independence among sites and rate constancy among lineages (the molecular clock); (iii) calculation of substitution rates at sites and reconstruction of ancestral nucleotide or amino acid sequences; and (iv) phylogenetic tree reconstruction by maximum likelihood and Bayesian methods. The strength of PAML, in comparison with other phylogenetic packages currently available, is its implementation of a variety of evolutionary models. These include several models of variable evolutionary rates among sites, models for combined analyses of multiple gene sequence data and models for amino acid sequences. Multifurcating trees are supported, as well as trees in which some sequences are ancestral to some others. A heuristic tree search algorithm (star decomposition) is used in the package, but tree making is not a strong point of the current version, although work is under way to implement efficient search algorithms. Major programs in the package, as well as the types of analyses they perform, are listed in Table 1. More details are available in the documentation included in the package, written using Microsoft Word. PAML is distributed free of charge for academic use only. The package, including ANSI C source codes, documentation, example data sets, and control files, can be obtained by anonymous ftp at mw511.biol.berkeley.edu/pub, or from the Indiana molecular biology ftp site at ftp.bio.indiana.edu under the directory Incoming or molbio/evolve . MAC and PowerMac executables are also available, although DOS executables are not prepared yet. Further information about the package is available from the World Wide Web at

Journal of Molecular Evolution | 1994

Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates over Sites: Approximate Methods

Ziheng Yang

Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the “discrete gamma model,” uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called “fixed-rates model,” classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsensteins (1981, J Mol Evol 17:368–376) model, which assumes a single rate for all the sites.

Trends in Ecology and Evolution | 2000

Statistical methods for detecting molecular adaptation.

Ziheng Yang; Joseph P. Bielawski

Abstract The past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. These methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. Numerous cases of molecular adaptation are being identified in various systems from viruses to humans. Although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. Here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements.

Journal of Molecular Evolution | 1996

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

Bruce Rannala; Ziheng Yang

A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process.

Trends in Ecology and Evolution | 1996

Among-site rate variation and its impact on phylogenetic analyses

Ziheng Yang

Although several decades of study have revealed the ubiquity of variation of evolutionary rates among sites, reliable methods for studying rate variation were not developed until very recently. Early methods fit theoretical distributions to the numbers of changes at sites inferred by parsimony and substantially underestimate the rate variation. Recent analyses show that failure to account for rate variation can have drastic effects, leading to biased dating of speciation events, biased estimation of the transition:transversion rate ratio, and incorrect reconstruction of phylogenies.

Journal of Molecular Evolution | 1994

Estimating the pattern of nucleotide substitution

Ziheng Yang

Knowledge of the pattern of nucleotide substitution is important both to our understanding of molecular sequence evolution and to reliable estimation of phylogenetic relationships. The method of parsimony analysis, which has been used to estimate substitution patterns in real sequences, has serious drawbacks and leads to results difficult to interpret. In this paper a model-based maximum likelihood approach is proposed for estimating substitution patterns in real sequences. Nucleotide substitution is assumed to follow a homogeneous Markov process, and the general reversible process model (REV) and the unrestricted model without the reversibility assumption are used. These models are also applied to examine the adequacy of the model of Hasegawa et al. (J. Mol. Evol. 1985;22:160–174) (HKY85). Two data sets are analyzed. For the Ψν-globin pseudogenes of six primate species, the REV model fits the data much better than HKY85, while, for a segment of mtDNA sequences from nine primates, REV cannot provide a significantly better fit than HKY85 when rate variation over sites is taken into account in the models. It is concluded that the use of the REV model in phylogenetic analysis can be recommended, especially for large data sets or for sequences with extreme substitution patterns, while HKY85 may be expected to provide a good approximation. The use of the unrestricted model does not appear to be worthwhile.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Bayesian species delimitation using multilocus sequence data

Ziheng Yang; Bruce Rannala

In the absence of recent admixture between species, bipartitions of individuals in gene trees that are shared across loci can potentially be used to infer the presence of two or more species. This approach to species delimitation via molecular sequence data has been constrained by the fact that genealogies for individual loci are often poorly resolved and that ancestral lineage sorting, hybridization, and other population genetic processes can lead to discordant gene trees. Here we use a Bayesian modeling approach to generate the posterior probabilities of species assignments taking account of uncertainties due to unknown gene trees and the ancestral coalescent process. For tractability, we rely on a user-specified guide tree to avoid integrating over all possible species delimitations. The statistical performance of the method is examined using simulations, and the method is illustrated by analyzing sequence data from rotifers, fence lizards, and human populations.

Journal of Molecular Evolution | 1998

Synonymous and Nonsynonymous Rate Variation in Nuclear Genes of Mammals

Ziheng Yang; Rasmus Nielsen

A maximum likelihood approach was used to estimate the synonymous and nonsynonymous substitution rates in 48 nuclear genes from primates, artiodactyls, and rodents. A codon-substitution model was assumed, which accounts for the genetic code structure, transition/transversion bias, and base frequency biases at codon positions. Likelihood ratio tests were applied to test the constancy of nonsynonymous to synonymous rate ratios among branches (evolutionary lineages). It is found that at 22 of the 48 nuclear loci examined, the nonsynonymous/synonymous rate ratio varies significantly across branches of the tree. The result provides strong evidence against a strictly neutral model of molecular evolution. Our likelihood estimates of synonymous and nonsynonymous rates differ considerably from previous results obtained from approximate pairwise sequence comparisons. The differences between the methods are explored by detailed analyses of data from several genes. Transition/transversion rate bias and codon frequency biases are found to have significant effects on the estimation of synonymous and nonsynonymous rates, and approximate methods do not adequately account for those factors. The likelihood approach is preferable, even for pairwise sequence comparison, because morerealistic models about the mutation and substitution processes can be incorporated in the analysis.

Proceedings of the National Academy of Sciences of the United States of America | 2001

Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals

Willie J. Swanson; Ziheng Yang; Mariana F. Wolfner; Charles F. Aquadro

Rapid evolution driven by positive Darwinian selection is a recurrent theme in male reproductive protein evolution. In contrast, positive selection has never been demonstrated for female reproductive proteins. Here, we perform phylogeny-based tests on three female mammalian fertilization proteins and demonstrate positive selection promoting their divergence. Two of these female fertilization proteins, the zona pellucida glycoproteins ZP2 and ZP3, are part of the mammalian egg coat. Several sites identified in ZP3 as likely to be under positive selection are located in a region previously demonstrated to be involved in species-specific sperm-egg interaction, suggesting the selective pressure is related to male-female interaction. The results provide long-sought evidence for two evolutionary hypotheses: sperm competition and sexual conflict.

Journal of Molecular Evolution | 1996

Maximum-Likelihood Models for Combined Analyses of Multiple Sequence Data

Ziheng Yang

Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of multiple genes) from the same set of species. The models account for different aspects of the heterogeneity in the evolutionary process of different genes, such as differences in nucleotide frequencies, in substitution rate bias (for example, the transition/transversion rate bias), and in the extent of rate variation across sites. Model parameters were estimated by maximum likelihood and the likelihood ratio test was used to test hypotheses concerning sequence evolution, such as rate constancy among lineages (the assumption of a molecular clock) and proportionality of branch lengths for different genes. The example data from a segment of the mitochondrial genome of six hominoid species (human, common and pygmy chimpanzees, gorilla, orangutan, and siamang) were analyzed. Nucleotides at the three codon positions in the protein-coding regions and from the tRNA-coding regions were considered heterogeneous data sets. Statistical tests showed that the amount of evolution in the sequence data reflected in the estimated branch lengths can be explained by the codon-position effect and lineage effect of substitution rates. The assumption of a molecular clock could not be rejected when the data were analyzed separately or when the rate variation among sites was ignored. However, significant differences in substitution rate among lineages were found when the data sets were combined and when the rate variation among sites was accounted for in the models. Under the assumption that the orangutan and African apes diverged 13 million years ago, the combined analysis of the sequence data estimated the times for the human-chimpanzee separation and for the separation of the gorilla as 4.3 and 6.8 million years ago, respectively.

Explore More