Nam Phuong Nguyen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nam Phuong Nguyen is active.

Explore More

Publication

Featured researches published by Nam Phuong Nguyen.

Proceedings of the National Academy of Sciences of the United States of America | 2014

Phylotranscriptomic analysis of the origin and early diversification of land plants

Norman J. Wickett; Siavash Mirarab; Nam Phuong Nguyen; Tandy J. Warnow; Eric J. Carpenter; Naim Matasci; Saravanaraj Ayyampalayam; Michael S. Barker; J. Gordon Burleigh; Matthew A. Gitzendanner; Brad R. Ruhfel; Eric Wafula; Joshua P. Der; Sean W. Graham; Sarah Mathews; Michael Melkonian; Douglas E. Soltis; Pamela S. Soltis; Nicholas W. Miles; Carl J. Rothfels; Lisa Pokorny; A. Jonathan Shaw; Lisa De Gironimo; Dennis W. Stevenson; Barbara Surek; Juan Carlos Villarreal; Béatrice Roure; Hervé Philippe; Claude W. de Pamphilis; Tao Chen

Significance Early branching events in the diversification of land plants and closely related algal lineages remain fundamental and unresolved questions in plant evolutionary biology. Accurate reconstructions of these relationships are critical for testing hypotheses of character evolution: for example, the origins of the embryo, vascular tissue, seeds, and flowers. We investigated relationships among streptophyte algae and land plants using the largest set of nuclear genes that has been applied to this problem to date. Hypothesized relationships were rigorously tested through a series of analyses to assess systematic errors in phylogenetic inference caused by sampling artifacts and model misspecification. Results support some generally accepted phylogenetic hypotheses, while rejecting others. This work provides a new framework for studies of land plant evolution. Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.

GigaScience | 2014

Data access for the 1,000 Plants (1KP) project

Naim Matasci; Ling Hong Hung; Zhixiang Yan; Eric J. Carpenter; Norman J. Wickett; Siavash Mirarab; Nam Phuong Nguyen; Tandy J. Warnow; Saravanaraj Ayyampalayam; Michael S. Barker; J. G. Burleigh; Matthew A. Gitzendanner; Eric Wafula; Joshua P. Der; Claude W. dePamphilis; Béatrice Roure; Hervé Philippe; Brad R. Ruhfel; Nicholas W. Miles; Sean W. Graham; Sarah Mathews; Barbara Surek; Michael Melkonian; Douglas E. Soltis; Pamela S. Soltis; Carl J. Rothfels; Lisa Pokorny; Jonathan Shaw; Lisa DeGironimo; Dennis W. Stevenson

The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.

Journal of Computational Biology | 2015

PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

Siavash Mirarab; Nam Phuong Nguyen; Sheng Guo; Li-San Wang; Junhyong Kim; Tandy J. Warnow

We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

npj Biofilms and Microbiomes | 2016

A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity

Nam Phuong Nguyen; Tandy J. Warnow; Mihai Pop; Bryan A. White

The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into ‘Operational Taxonomic Units’ (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.

Genome Biology | 2015

Ultra-large alignments using phylogeny-aware profiles

Nam Phuong Nguyen; Siavash Mirarab; Keerthana Kumar; Tandy J. Warnow

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Journal of the Royal Society Interface | 2009

Modelling amorphous computations with transcription networks

Zack B. Simpson; Timothy L. Tsai; Nam Phuong Nguyen; Xi Chen; Andrew D. Ellington

The power of electronic computation is due in part to the development of modular gate structures that can be coupled to carry out sophisticated logical operations and whose performance can be readily modelled. However, the equivalences between electronic and biochemical operations are far from obvious. In order to help cross between these disciplines, we develop an analogy between complementary metal oxide semiconductor and transcriptional logic gates. We surmise that these transcriptional logic gates might prove to be useful in amorphous computations and model the abilities of immobilized gates to form patterns. Finally, to begin to implement these computations, we design unique hairpin transcriptional gates and then characterize these gates in a binary latch similar to that already demonstrated by Kim et al. (Kim, White & Winfree 2006 Mol. Syst. Biol. 2, 68 (doi:10.1038/msb4100099)). The hairpin transcriptional gates are uniquely suited to the design of a complementary NAND gate that can serve as an underlying basis of molecular computing that can output matter rather than electronic information.

Journal of Theoretical Biology | 2010

Design and analysis of a robust genetic Muller C-element.

Nam Phuong Nguyen; Chris J. Myers; Hiroyuki Kuwahara; Chris Winstead; James P. Keener

This paper presents results on the design and analysis of a robust genetic Muller C-element. The Muller C-element is a standard logic gate commonly used to synchronize independent processes in most asynchronous electronic circuits. Synthetic biological logic gates have been previously demonstrated, but there remain many open issues in the design of sequential (state-holding) logic operations. Three designs are considered for the genetic Muller C-element: a majority gate, a toggle switch, and a speed-independent implementation. While the three designs are logically equivalent, each design requires different assumptions to operate correctly. The majority gate design requires the most timing assumptions, the speed-independent design requires the least, and the toggle switch design is a compromise between the two. This paper examines the robustness of these designs as well as the effects of parameter variation using stochastic simulation. The results show that robustness to timing assumptions does not necessarily increase reliability, suggesting that modifications to existing logic design tools are going to be necessary for synthetic biology. Parameter variation simulations yield further insights into the design principles necessary for building robust genetic gates. The results suggest that high gene count, cooperativity of at least two, tight repression, and balanced decay rates are necessary for robust gates. Finally, this paper presents a potential application of the genetic Muller C-element as a quorum-mediated trigger.

Systematic Biology | 2017

Phylogenomics from whole genome sequences using aTRAM

Julie M. Allen; Bret M. Boyd; Nam Phuong Nguyen; Pranjal Vachaspati; Tandy J. Warnow; Daisie Iris Huang; Patrick G.S. Grady; Kayce C. Bell; Quentin C. B. Cronk; Lawrence Mugisha; Barry R. Pittendrigh; M. Soledad Leonardi; David L. Reed; Kevin P. Johnson

&NA; Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long‐term utility of the data. Currently, for organisms with moderate to small genomes (<1000 Mbp) it is feasible to sequence the entire genome at modest coverage (10‐30×). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single‐copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out‐groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close‐ to distantly related taxa at high to low levels of coverage. Both the concatenated analysis and the coalescent‐based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above 5‐10×, aTRAM was successful at assembling 80‐90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]

Molecular Biology and Evolution | 2017

Primates, Lice and Bacteria: Speciation and Genome Evolution in the Symbionts of Hominid Lice

Bret M. Boyd; Julie M. Allen; Nam Phuong Nguyen; Pranjal Vachaspati; Zachary S. Quicksall; Tandy J. Warnow; Lawrence Mugisha; Kevin P. Johnson; David L. Reed

Abstract Insects with restricted diets rely on symbiotic bacteria to provide essential metabolites missing in their diet. The blood-sucking lice are obligate, host-specific parasites of mammals and are themselves host to symbiotic bacteria. In human lice, these bacterial symbionts supply the lice with B-vitamins. Here, we sequenced the genomes of symbiotic and heritable bacterial of human, chimpanzee, gorilla, and monkey lice and used phylogenomics to investigate their evolutionary relationships. We find that these symbionts have a phylogenetic history reflecting the louse phylogeny, a finding contrary to previous reports of symbiont replacement. Examination of the highly reduced symbiont genomes (0.53–0.57 Mb) reveals much of the genomes are dedicated to vitamin synthesis. This is unchanged in the smallest symbiont genome and one that appears to have been reorganized. Specifically, symbionts from human lice, chimpanzee lice, and gorilla lice carry a small plasmid that encodes synthesis of vitamin B5, a vitamin critical to the bacteria-louse symbiosis. This plasmid is absent in an old world monkey louse symbiont, where this pathway is on its primary chromosome. This suggests the unique genomic configuration brought about by the plasmid is not essential for symbiosis, but once obtained, it has persisted for up to 25 My. We also find evidence that human, chimpanzee, and gorilla louse endosymbionts have lost a pathway for synthesis of vitamin B1, whereas the monkey louse symbiont has retained this pathway. It is unclear whether these changes are adaptive, but they may point to evolutionary responses of louse symbionts to shifts in primate biology.

Emerging microbes & infections | 2016

High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

Thomas Arn Hansen; Sarah Mollerup; Nam Phuong Nguyen; Nicole E. White; Megan L. Coghlan; David E. Alquezar-Planas; Tejal Joshi; Randi Holm Jensen; Helena Fridholm; Kristín Rós Kjartansdóttir; Tobias Mourier; Tandy J. Warnow; Graham J. Belsham; Michael Bunce; Lars Peter Nielsen; Lasse Vinner; Anders J. Hansen

Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler’s encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.

Explore More