Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Von Bing Yap is active.

Publication


Featured researches published by Von Bing Yap.


pacific symposium on biocomputing | 2001

SCORING PAIRWISE GENOMIC SEQUENCE ALIGNMENTS

Francesca Chiaromonte; Von Bing Yap; Webb Miller

The parameters by which alignments are scored can strongly affect sensitivity and specificity of alignment procedures. While appropriate parameter choices are well understood for protein alignments, much less is known for genomic DNA sequences. We describe a straightforward approach to scoring nucleotide substitutions in genomic sequence alignments, especially human-mouse comparisons. Scores are obtained from relative frequencies of aligned nucleotides observed in alignments of non-coding, non-repetitive genomic regions, and can be theoretically motivated through substitution models. Additional accuracy can be attained by down-weighting alignments characterized by low compositional complexity. We also describe an evaluation protocol that is relevant when alignments are intended to identify all and only the orthologous positions. One particular scoring matrix, called HOXD70, has proven to be generally effective for human-mouse comparisons, and has been used by the PipMaker server since July, 2000. We discuss but leave open the problem of effectively scoring regions of strongly biased nucleotide composition, such as low G + C content.


Proceedings of the National Academy of Sciences of the United States of America | 2001

Association between divergence and interspersed repeats in mammalian noncoding genomic DNA

Francesca Chiaromonte; Shan Yang; Laura Elnitski; Von Bing Yap; Webb Miller; Ross C. Hardison

The amount of noncoding genomic DNA sequence that aligns between human and mouse varies substantially in different regions of their genomes, and the amount of repetitive DNA also varies. In this report, we show that divergence in noncoding nonrepetitive DNA is strongly correlated with the amount of repetitive DNA in a region. We investigated aligned DNA in four large genomic regions with finished human sequence and almost or completely finished mouse sequence. These regions, totaling 5.89 Mb of DNA, are on different chromosomes and vary in their base composition. An analysis based on sliding windows of 10 kb shows that the fraction of aligned noncoding nonrepetitive DNA and the fraction of repetitive DNA are negatively correlated, both at the level of an entire region and locally within it. This conclusion is strongly supported by a randomization study, in which repetitive elements are removed and randomly relocated along the sequences. Thus, regions of noncoding genomic DNA that accumulated fewer point mutations since the primate–rodent divergence also suffered fewer retrotransposition events. These results indicate that some regions of the genome are more “flexible” over the time scale of mammalian evolution, being able to accommodate many point mutations and insertions, whereas other regions are more “rigid” and accumulate fewer changes. Stronger conservation is generally interpreted as indicating more extensive or more important function. The evidence presented here of correlated variation in the rates of different evolutionary processes across noncoding DNA must be considered in assessing such conservation for evidence of selection.


Biodiversity and Conservation | 2010

The extent of undiscovered species in Southeast Asia

Xingli Giam; Ting Hui Ng; Von Bing Yap; Hugh T. W. Tan

Southeast Asia has the highest rate of deforestation among all tropical regions in the world. Depending on the number of undiscovered species not yet known to science, a sizeable proportion of species may have gone extinct or will go extinct in the future without record. We compiled species datasets for eight taxa, each consisting of a list of native species and their description dates. Birds, legumes, mosquitoes, and mosses showed recent declines in species discovery rate. For these taxa, we estimated the total species richness by applying generalized linear models derived from theory. The number of undiscovered species in each taxon was calculated and the extent of undiscovered species among the taxa compared. Among these taxa that displayed a species discovery decline, the legumes had the highest extent of undiscovered species while the birds had the most complete species inventory. Although quantitative estimates of the number of undiscovered species for amphibians, freshwater fish, hawkmoths, and mammals could not be derived, the extent of undiscovered species is likely to be high as their recent discovery rates showed a continued increase. If these taxa are more or less representative of other Southeast Asian taxa, many species are likely to go extinct before ever being discovered by science under the current rates of habitat loss. We therefore urge the intensification of taxonomic and species discovery research in the taxa in which the extent of undiscovered species is relatively high, i.e., amphibians, freshwater fish, hawkmoths, mammals, and legumes.


BMC Evolutionary Biology | 2005

Rooting a phylogenetic tree with nonreversible substitution models

Von Bing Yap; Terry Speed

BackgroundWe compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup.MethodsGiven a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values. Site variation in substitution rates is handled by assigning sites into several classes before the analysis.ResultsIn three test datasets where the trees are small and the roots are assumed known, the nonstationary process gets the correct estimate significantly more often, and fits data much better, than the stationary process. Both processes give biologically plausible root placements in a set of nine primate mitochondrial DNA sequences.ConclusionsThe nonstationary process is simple to use and is much better than the stationary process at inferring the root. It could be useful for situations where an outgroup is unavailable.


Molecular Biology and Evolution | 2010

Estimates of the Effect of Natural Selection on Protein-Coding Content

Von Bing Yap; Helen Lindsay; Simon Easteal; Gavin A. Huttley

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (ω) distinguishes neutrally evolving sequences (ω = 1) from those subjected to purifying (ω < 1) or positive Darwinian (ω > 1) selection. We show that current models used to estimate ω are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with ∼10–30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.


Linear Algebra and its Applications | 1998

Matrix extension and biorthogonal multiwavelet construction

Say Song Goh; Von Bing Yap

Abstract Suppose that P(z) and P (z) are two r × n matrices over the Laurent polynomial ring R[z], where r P(z) P (z)∗ = I r on the unit circle T . We develop an algorithm that produces two n × n matrices Q(z) and Q (z) over R[z], satisfying the identity Q(z) Q (z)∗ = I n on T such that the submatrices formed by the first r rows of Q(z) and Q (z) are P(z) and P (z) respectively. Our algorithm is used to construct compactly supported biorthogonal multiwavelets from multiresolutions generated by univariate compactly supported biorthogonal scaling functions with an arbitrary dilation parameter m ∈ Z, where m >1.


BMC Bioinformatics | 2009

Effects of normalization on quantitative traits in association test

Liang Goh; Von Bing Yap

BackgroundQuantitative trait loci analysis assumes that the trait is normally distributed. In reality, this is often not observed and one strategy is to transform the trait. However, it is not clear how much normality is required and which transformation works best in association studies.ResultsWe performed simulations on four types of common quantitative traits to evaluate the effects of normalization using the logarithm, Box-Cox, and rank-based transformations. The impact of sample size and genetic effects on normalization is also investigated. Our results show that rank-based transformation gives generally the best and consistent performance in identifying the causal polymorphism and ranking it highly in association tests, with a slight increase in false positive rate.ConclusionFor small sample size or genetic effects, the improvement in sensitivity for rank transformation outweighs the slight increase in false positive rate. However, for large sample size and genetic effects, normalization may not be necessary since the increase in sensitivity is relatively modest.


Biology Direct | 2008

Pitfalls of the most commonly used models of context dependent substitution

Helen Lindsay; Von Bing Yap; Hua Ying; Gavin A. Huttley

BackgroundNeighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates.ResultsWe prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms.ConclusionOur results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for ~85% of neighboring nucleotide influence.ReviewersThis article was reviewed by Dr Rob Knight, Dr Josh Cherry (nominated by Dr David Lipman) and Dr Stephen Altschul (nominated by Dr David Lipman).


Journal of Molecular Evolution | 2004

Modeling DNA Base Substitution in Large Genomic Regions from Two Organisms

Von Bing Yap; Terence P. Speed

We studied the substitution patterns in 7661 well-conserved human–mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model. In addition, most of the fitted rate matrices are not of the HKY type but are remarkably strand-symmetric, and we constructed a number of substitution matrices that should be useful for genomic DNA sequence alignment. We did not find obvious signs of temporal inhomogeneity in the substitution rates and concluded that the conserved intergenic regions in human chromosome 22 and mouse appear to have evolved from their common ancestors via a process that is approximately reversible and strand-symmetric, assuming site homogeneity and independence.


Systematic Biology | 2015

Genetic Distance for a General Non-Stationary Markov Substitution Process

Benjamin Kaehler; Von Bing Yap; Rongli Zhang; Gavin A. Huttley

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.

Collaboration


Dive into the Von Bing Yap's collaboration.

Top Co-Authors

Avatar

Gavin A. Huttley

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Hugh T. W. Tan

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kwek Yan Chong

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Mark B. Raphael

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Rongli Zhang

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Francesca Chiaromonte

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Rob Knight

University of California

View shared research outputs
Top Co-Authors

Avatar

Terry Speed

University of California

View shared research outputs
Top Co-Authors

Avatar

Webb Miller

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge