Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Md. Shamsuzzoha Bayzid is active.

Publication


Featured researches published by Md. Shamsuzzoha Bayzid.


Bioinformatics | 2014

ASTRAL: genome-scale coalescent-based species tree estimation

S. Mirarab; R. Reaz; Md. Shamsuzzoha Bayzid; T. Zimmermann; M. S. Swenson; Tandy J. Warnow

Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Science | 2014

Statistical binning enables an accurate coalescent-based estimation of the avian tree

Siavash Mirarab; Md. Shamsuzzoha Bayzid; Bastien Boussau; Tandy J. Warnow

Introduction Reconstructing species trees for rapid radiations, as in the early diversification of birds, is complicated by biological processes such as incomplete lineage sorting (ILS) that can cause different parts of the genome to have different evolutionary histories. Statistical methods, based on the multispecies coalescent model and that combine gene trees, can be highly accurate even in the presence of massive ILS; however, these methods can produce species trees that are topologically far from the species tree when estimated gene trees have error. We have developed a statistical binning technique to address gene tree estimation error and have explored its use in genome-scale species tree estimation with MP-EST, a popular coalescent-based species tree estimation method. The statistical binning pipeline for estimating species trees from gene trees. Loci are grouped into bins based on a statistical test for combinabilty, before estimating gene trees. Rationale In statistical binning, phylogenetic trees on different genes are estimated and then placed into bins, so that the differences between trees in the same bin can be explained by estimation error (see the figure). A new tree is then estimated for each bin by applying maximum likelihood to a concatenated alignment of the multiple sequence alignments of its genes, and a species tree is estimated using a coalescent-based species tree method from these supergene trees. Results Under realistic conditions in our simulation study, statistical binning reduced the topological error of species trees estimated using MP-EST and enabled a coalescent-based analysis that was more accurate than concatenation even when gene tree estimation error was relatively high. Statistical binning also reduced the error in gene tree topology and species tree branch length estimation, especially when the phylogenetic signal in gene sequence alignments was low. Species trees estimated using MP-EST with statistical binning on four biological data sets showed increased concordance with the biological literature. When MP-EST was used to analyze 14,446 gene trees in the avian phylogenomics project, it produced a species tree that was discordant with the concatenation analysis and conflicted with prior literature. However, the statistical binning analysis produced a tree that was highly congruent with the concatenation analysis and was consistent with the prior scientific literature. Conclusions Statistical binning reduces the error in species tree topology and branch length estimation because it reduces gene tree estimation error. These improvements are greatest when gene trees have reduced bootstrap support, which was the case for the avian phylogenomics project. Because using unbinned gene trees can result in overestimation of ILS, statistical binning may be helpful in providing more accurate estimations of ILS levels in biological data sets. Thus, statistical binning enables highly accurate species tree estimations, even on genome-scale data sets. Gene tree incongruence arising from incomplete lineage sorting (ILS) can reduce the accuracy of concatenation-based estimations of species trees. Although coalescent-based species tree estimation methods can have good accuracy in the presence of ILS, they are sensitive to gene tree estimation error. We propose a pipeline that uses bootstrapping to evaluate whether two genes are likely to have the same tree, then it groups genes into sets using a graph-theoretic optimization and estimates a tree on each subset using concatenation, and finally produces an estimated species tree from these trees using the preferred coalescent-based method. Statistical binning improves the accuracy of MP-EST, a popular coalescent-based method, and we use it to produce the first genome-scale coalescent-based avian tree of life.


PLOS ONE | 2014

Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

Rezwana Reaz; Md. Shamsuzzoha Bayzid; M. Sohel Rahman

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.


Science | 2015

Response to Comment on “Statistical binning enables an accurate coalescent-based estimation of the avian tree”

Siavash Mirarab; Md. Shamsuzzoha Bayzid; Bastien Boussau; Tandy J. Warnow

Liu and Edwards argue against the use of weighted statistical binning within a species tree estimation pipeline. However, we show that their mathematical argument does not apply to weighted statistical binning. Furthermore, their simulation study does not follow the recommended statistical binning protocol and has data of unknown origin that bias the results against weighted statistical binning.


International Scholarly Research Notices | 2013

HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction

Md. Shamsuzzoha Bayzid; Md. Maksudul Alam; Abdullah Mueen; Md. Saidur Rahman

Abstract Haplotype is a pattern of Single Nucleotide Polymorphisms (SNP) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach states to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is 0(m3k) for an m x k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns) respectively in a SNP matrix. Alternative gain measure is also given to reduce running time. Experimental results show that our algorithm outperforms the best known previous algorithm.


computing and combinatorics conference | 2010

Discovering pairwise compatibility graphs

Muhammad Nur Yanhaona; Md. Shamsuzzoha Bayzid; Md. Saidur Rahman

Let T be an edge weighted tree, let dT (u, v) be the sum of the weights of the edges on the path from u to v in T, and let dmin and dmax be two nonnegative real numbers such that dmin ≤ dmax. Then a pairwise compatibility graph of T for dmin and dmax is a graph G = (V,E), where each vertex u′ ∈ V corresponds to a leaf u of T and there is an edge (u′, v′) ∈ E if and only if dmin ≤ dT (u, v) = dmax. A graph G is called a pairwise compatibility graph (PCG) if there exists an edge weighted tree T and two non-negative real numbers dmin and dmax such that G is a pairwise compatibility graph of T for dmin and dmax. Kearney et al. conjectured that every graph is a PCG [3]. In this paper, we refute the conjecture by showing that not all graphs are PCGs. We also show that the well known tree power graphs and some of their extensions are PCGs.


international conference on electrical and control engineering | 2008

Application of artificial neural network in social computing in the context of third world countries

Md. Shamsuzzoha Bayzid; Anindya Iqbal; Chowdhury Sayeed Hyder; Mohammad Tanvir Irfan

In the last decade, applications associated with artificial neural network (ANN) has been gaining popularity in both the academic research and practitionerpsilas sectors. But unfortunately in underdeveloped countries this versatile tool has not yet been used. Here we consider some momentous sectors and explore the applicability of ANN in the context of third world countries. Here we explore the design of feed forward neural network for (1) assisting micro credit institutions to select appropriate locations to set up branches and (2) determining HIV risk of a locality. The simulation procedure and results are discussed accordingly.


biomedical engineering and informatics | 2008

A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction

Abdullah Mueen; Md. Shamsuzzoha Bayzid; Md. Saidur Rahman; Md. Maksudul Alam

Abstract Haplotype is a pattern of Single Nucleotide Polymorphisms (SNP) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach states to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is 0(m3k) for an m x k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns) respectively in a SNP matrix. Alternative gain measure is also given to reduce running time. Experimental results show that our algorithm outperforms the best known previous algorithm.


bioRxiv | 2018

Multi-objective formulation of MSA for phylogeny estimation (Do phylogeny-aware measures guide towards better phylogenetic tree?)

Muhammad Ali Nayeem; Md. Shamsuzzoha Bayzid; Atif Rahman; Rifat Shahriyar; M. Sohel Rahman

Multiple sequence alignment (MSA) is a basic step in many analyses in computational biology, including predicting the structure and function of proteins, orthology prediction and estimating phylogenies. The objective of MSA is to infer the homology among the sequences of chosen species. Commonly, the MSAs are inferred by optimizing a single function or objective. The alignments estimated under one criterion may be different to the alignments generated by other criteria, inferring discordant homologies and thus leading to different evolutionary histories relating the sequences. In recent past, researchers have advocated for the multi-objective formulation of MSA, to address this issue, where multiple conflicting objective functions are being optimized simultaneously to generate a set of alignments. However, no theoretical or empirical justification with respect to a real-life application has been shown for a particular multi-objective formulation. In this study, we investigate the impact of multi-objective formulation in the context of phylogenetic tree estimation. Employing multi-objective metaheuristics, we demonstrate that trees estimated on the alignments generated by multi-objective formulation are substantially better than the trees estimated by the state-of-the-art MSA tools, including PASTA, MUSCLE, CLUSTAL, MAFFT etc. We also demonstrate that highly accurate alignments with respect to popular measures like sum-of-pair (SP) score and total-column (TC) score do not necessarily lead to highly accurate phylogenetic trees. Thus in essence we ask the question whether a phylogeny-aware metric can guide us in choosing appropriate multi-objective formulations that can result in better phylogeny estimation. And we answer the question affirmatively through carefully designed extensive empirical study. As a by-product we also suggest a methodology for primary selection of a set of objective functions for a multi-objective formulation based on the association with the resulting phylogenetic tree.


biomedical engineering and informatics | 2010

A heuristic algorithm for Minimum Conflict Individual Haplotyping

Md. Shamsuzzoha Bayzid; Md. Maksudul Alam; Md. Saidur Rahman

Haplotype is a pattern of SNPs (Single Nucleotide Polymorphism) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction (MEC) model, which is the mostly used model, minimizes the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. However, this model is effective only when the error rate of SNP fragments is low. To overcome this problem, Zhang et al. proposed a new model called Minimum Conflict Individual Haplotyping (MCIH) as an extension to MEC [1]. This new model uses both SNP fragment information and related genotype information for haplotype reconstruction. MCIH has already been proven to be a potential alternative in individual haplotyping. In this paper, we give a heuristic algorithm for MCIH that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Experimental results on real data show that our algorithm performs better than the best known algorithm for MEC and the algorithm for MCIH proposed by Zhang et al. [1].

Collaboration


Dive into the Md. Shamsuzzoha Bayzid's collaboration.

Top Co-Authors

Avatar

Md. Saidur Rahman

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Md. Maksudul Alam

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

M. Sohel Rahman

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Muhammad Nur Yanhaona

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Abdullah Mueen

University of New Mexico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anindya Iqbal

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Atif Rahman

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Chowdhury Sayeed Hyder

Bangladesh University of Engineering and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge