Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yao-ban Chan is active.

Publication


Featured researches published by Yao-ban Chan.


Journal of Statistical Physics | 2011

The Ising susceptibility scaling function

Yao-ban Chan; A J Guttmann; B. G. Nickel; Jacques H.H. Perk

We have dramatically extended the zero field susceptibility series at both high and low temperature of the Ising model on the triangular and honeycomb lattices, and used these data and newly available further terms for the square lattice to calculate a number of terms in the scaling function expansion around both the ferromagnetic and, for the square and honeycomb lattices, the antiferromagnetic critical point.


Frontiers in Microbiology | 2017

Robust inference of genetic exchange communities from microbial genomes using TF-IDF

Yingnan Cong; Yao-ban Chan; Charles A. Phillips; Michael A. Langston; Mark A. Ragan

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.


Journal of the American Statistical Association | 2010

Using Evidence of Mixed Populations to Select Variables for Clustering Very High-Dimensional Data

Yao-ban Chan; Peter Hall

In this paper we develop a nonparametric approach to clustering very high-dimensional data, designed particularly for problems where the mixture nature of a population is expressed through multimodality of its density. Therefore, a technique based implicitly on mode testing can be particularly effective. In principle, several alternative approaches could be used to assess the extent of multimodality, but in the present problem the excess mass method has important advantages. We show that the resulting methodology for determining clusters is particularly effective in cases where the data are relatively heavy tailed or show a moderate to high degree of correlation, or when the number of important components is relatively small. Conversely, in the case of light-tailed, almost-independent components when there are many clusters, clustering in terms of modality can be less reliable than more conventional approaches. This article has supplementary material online.


Journal of Physics A | 2012

Series expansions from the corner transfer matrix renormalization group method: the hard-squares model

Yao-ban Chan

The corner transfer matrix renormalization group method is an efficient method for evaluating physical quantities in statistical mechanical models. It originates from Baxter’s corner transfer matrix equations and method, and was developed by Nishino and Okunishi in 1996. In this paper, we review and adapt this method, previously used for numerical calculations, to derive series expansions. We use this to calculate 92 terms of the partition function of the hard-squares model. We use the resulting series to provide evidence supporting the claim that the method is subexponential in the number of generated terms, and briefly analyse the singularities of the function.


Scientific Reports | 2016

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF

Yingnan Cong; Yao-ban Chan; Mark A. Ragan

Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length.


Scientific Reports | 2016

Exploring lateral genetic transfer among microbial genomes using TF-IDF

Yingnan Cong; Yao-ban Chan; Mark A. Ragan

Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.


Journal of Physics A | 2013

Series expansions from the corner transfer matrix renormalization group method: II. Asymmetry and high-density hard squares

Yao-ban Chan

The corner transfer matrix renormalization group method is a method, claimed to be subexponential, for numerically calculating physical quantities of statistical mechanical models. In a previous paper, we extended this method to generate series expansions. Here, we show how to extend both the original and series methods to deal with asymmetry in any model. We discuss the cases of rotational, translational and reflection asymmetry, and give some improvements to the method. This is demonstrated by an application of the method to generate series for the hard square model in the high-density regime, producing 51 terms of the partition function and 48 terms of the order parameter. These series are analysed, producing estimates of the critical point and exponents, and showing the likely presence of a confluent singularity with exponent 17/8 in the order parameter.


Briefings in Bioinformatics | 2017

Alignment-free inference of hierarchical and reticulate phylogenomic relationships

Guillaume Bernard; Cheong Xin Chan; Yao-ban Chan; Xin-Yi Chua; Yingnan Cong; James M. Hogan; Stefan Maetschke; Mark A. Ragan

Abstract We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.


Annals of Statistics | 2009

Robust nearest-neighbor methods for classifying high-dimensional data

Yao-ban Chan; Peter Hall

We suggest a robust nearest-neighbor approach to classifying high-dimensional data. The method enhances sensitivity by employing a threshold and truncates to a sequence of zeros and ones in order to reduce the deleterious impact of heavy-tailed data. Empirical rules are suggested for choosing the threshold. They require the bare minimum of data; only one data vector is needed from each population. Theoretical and numerical aspects of performance are explored, paying particular attention to the impacts of correlation and heterogeneity among data components. On the theoretical side, it is shown that our truncated, thresholded, nearest-neighbor classifier enjoys the same classification boundary as more conventional, nonrobust approaches, which require finite moments in order to achieve good performance. In particular, the greater robustness of our approach does not come at the price of reduced effectiveness. Moreover, when both training sample sizes equal 1, our new method can have performance equal to that of optimal classifiers that require independent and identically distributed data with known marginal distributions; yet, our classifier does not itself need conditions of this type.


BMC Bioinformatics | 2013

Reconciliation-based detection of co-evolving gene families

Yao-ban Chan; Vincent Ranwez; Celine Scornavacca

BackgroundGenes located in the same chromosome region share common evolutionary events more often than other genes (e.g. a segmental duplication of this region). Their evolution may also be related if they are involved in the same protein complex or biological process. Identifying co-evolving genes can thus shed light on ancestral genome structures and functional gene interactions.ResultsWe devise a simple, fast and accurate probability method based on species tree-gene tree reconciliations to detect when two gene families have co-evolved. Our method observes the number and location of predicted macro-evolutionary events, and estimates the probability of having the observed number of common events by chance.ConclusionsSimulation studies confirm that our method effectively identifies co-evolving families. This opens numerous perspectives on genome-scale analysis where this method could be used to pinpoint co-evolving gene families and thus help to unravel ancestral genome arrangements or undocumented gene interactions.

Collaboration


Dive into the Yao-ban Chan's collaboration.

Top Co-Authors

Avatar

Andrew Rechnitzer

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mark A. Ragan

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Yingnan Cong

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A J Guttmann

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Peter Hall

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

A L Owczarek

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge