Koretsugu Ogata | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Koretsugu Ogata is active.

Explore More

Publication

Featured researches published by Koretsugu Ogata.

Current Microbiology | 2000

Phenotypic Characterization of Polysaccharidases Produced by Four Prevotella Type Strains

Hiroki Matsui; Koretsugu Ogata; Kiyoshi Tajima; Mutsumi Nakamura; Takafumi Nagamine; Rustam I. Aminov; Yoshimi Benno

Four ruminal Prevotella type strains, P. ruminicola JCM8958T, P. bryantii B14T, P. albensis M384T, and P. brevis ATCC19188T, were characterized for polysaccharide-degrading activities with the reducing sugar release assay and zymogram analyses. Carboxymethylcellulase, xylanase, and polygalacturonate (PG)-degrading enzyme activities were determined in cultures grown on oat spelt xylan, xylose, arabinose, cellobiose, and glucose as sole growth substrates. P. ruminicola and P. albensis showed carboxymethylcellulase induction patterns. When xylan was supplied as a sole growth substrate, xylanase activities produced by P. bryantii and P. albensis were at least 18- and 11-fold higher, respectively, than during growth on other carbohydrates, suggesting that the regulation of the xylanases was highly specific to xylan. All strains constitutively produced PG-degrading enzymes. The corresponding activity of P. bryantii was more than 40-fold higher than in other strains. Zymogram analyses routinely detected the presence of high-molecular-weight (100–170 kDa) polysaccharide-degrading enzymes in ruminal Prevotella. Characteristics of the polysaccharide-degrading activities showed diversity of ruminal Prevotella species.

Scientific Reports | 2013

The significance of microscopic mass spectrometry with high resolution in the visualisation of drug distribution

Masahiro Yasunaga; Masaru Furuta; Koretsugu Ogata; Yoshikatsu Koga; Yoshiyuki Yamamoto; Misato Takigahira; Yasuhiro Matsumura

The visualisation and quantitative analysis of the native drug distribution in a pre-clinical or clinical setting are desirable for evaluating drug effects and optimising drug design. Here, using matrix-assisted laser desorption ionisation imaging mass spectrometry (MALDI-IMS) with enhanced resolution and sensitivity, we compared the distribution of a paclitaxel (PTX)-incorporating micelle (NK105) with that of PTX alone after injection into tumour-bearing mice. We demonstrated optically and quantitatively that NK105 delivered more PTX to the tumour, including the centre of the tumour, while delivering less PTX to normal neural tissue, compared with injection with PTX alone. NK105 treatment yielded a greater antitumour effect and less neural toxicity in mice than did PTX treatment. The use of high-resolution MALDI-IMS may be an innovative approach for pharmacological evaluation and drug design support.

Current Microbiology | 2001

The Replicon of the Cryptic Plasmid pSBO1 Isolated from Streptococcus bovis JB1

Mutsumi Nakamura; Koretsugu Ogata; Takafumi Nagamine; Kiyoshi Tajima; Hiroki Matsui; Yoshimi Benno

The cryptic plasmid pSBO1 (3904 bp) was isolated from Streptococcus bovis JB1. pSBO1 contained an open reading frame (ORF) that is homologous to sequences encoding the replication protein (Rep) in pEFC1 (isolated from Enterococcus faecalis), pSK639 (Staphylococcus epidermidis), pLA103 (Lactobacillus acidophilus), and pUCL287 (Tetragenococcus halophila). In addition, four 22-bp direct repeats (DRs) were located upstream of the putative replication gene (rep) of pSBO1. Recombinant plasmids (pSBE10 and pSBE11) containing the DRs and putative rep of pSBO1 replicated in S. bovis 12-U-1 and no8 strains. This result indicates that the putative rep encoded Rep and that the replicon of pSBO1 contained the DRs and the rep. Gel shift assays showed that the Rep of pSBO1 bound the 22-bp DRs.

Current Microbiology | 1999

Sequence Analysis of Small Cryptic Plasmids Isolated from Selenomonas ruminantium S20

Mutsumi Nakamura; Takafumi Nagamine; Koretsugu Ogata; Kiyoshi Tajima; Rustem I. Aminov; Yoshimi Benno

Abstract. Two small cryptic plasmids designated pONE429 and pONE430 were isolated from a rumen bacterium, Selenomonas ruminantium S20. The complete sequence of pONE429 was 2100 bp and contained one open reading frame (ORF) of 201 amino acids. The sequence of pONE430 had 1527 bp and one ORF of 171 amino acids with the similarity of replication protein (Rep protein) of pOM1, pSN2, and pIM13 isolated from Butyrivibrio fibrisolvens, Staphylococcus aureus, and Bacillus subtilis, respectively. In these plasmids, the upstream nucleotide sequence of Rep protein had the conserved nucleotides which could be double-strand origin (DSO) of rolling circle replication (RCR) mechanism. The plasmids of pONE429, pONE430, pJJMI, pJDB21, and pS23 were isolated from S. ruminantium strains and had similar regions that were located within a <450-bp nucleotide. These similar regions may be the location that was recognized by the host strain, S. ruminantium.

Current Microbiology | 1997

Construction of a Fibrobacter succinogenes genomic map and demonstration of diversity at the genomic level.

Koretsugu Ogata; Rustem I. Aminov; Takafumi Nagamine; Mutsumi Sugiura; Kiyoshi Tajima; Makoto Mitsumori; Tsutomu Sekizaki; Hiroshi Kudo; Hajime Minato; Yoshimi Benno

Abstract. The genomic cleavage map of the type strain Fibrobacter succinogenes S85 was constructed. The restriction enzymes AscI, AvrII, FseI, NotI, and SfiI generated DNA fragments of suitable size distribution that could be resolved by pulsed-field gel electrophoresis (PFGE). An average genome size of 3.6 Mb was obtained by summing the total fragment sizes. The linkages between the 15 AscI fragments of the genome were determined by combining two approaches: isolation of linking clones and cross-hybridization of restriction fragments. The genome of F. succinogenes was found to be represented by the single circular DNA molecule. Southern hybridization with specific probes allowed the eight genetic markers to be located on the restriction map. The genome of this bacterium contains at least three rRNA operons. PFGE of the other three strains of F. succinogenes gave estimated genome sizes close to that of the type strain. However, RFLP patterns of these strains generated by AscI digestion are completely different. Pairwise comparison of the genomic fragment distribution between the type strain and the three isolates showed a similarity level in the region of 14.3% to 31.3%. No fragment common to all of these F. succinogenes strains could be detected by PFGE. A marked degree of genomic heterogeneity among members of this species makes genomic RFLP a highly discriminatory and useful molecular typing tool for population studies.

Current Microbiology | 2000

Characterization of the Cryptic Plasmid pSBO2 Isolated from Streptococcus bovis JB1 and Construction of a New Shuttle Vector

Mutsumi Nakamura; Koretsugu Ogata; Takafumi Nagamine; Kiyoshi Tajima; Hiroki Matsui; Yoshimi Benno

A cryptic plasmid designated pSBO2 (3582 bp) was isolated from Streptococcus bovis JB1. The pSBO2 contained putative sites for a double-strand origin (dso), a small transcriptional repressor protein (Cop), countertranscribed RNAs (ctRNAs), and a replication protein (Rep), which were similar to those from pMV158 and pLS1, which were isolated from S. agalactiae, and pWVO1, isolated from Lactococcus lactis. The putative single-strand origin (sso) of pSBO2 was similar to pER341 and pST1, which were isolated from S. thermophilus. Recombinant plasmid designated pSBE2 was constructed to bind pECM184 vector and the DNA fragment containing sso, dso, Cop, ctRNAs, and Rep of pSBO2. When pSBE2 was introduced into S. bovis 12-U-1 and no8, the plasmids in the transformants had deleted the 160-bp fragment between sso and dso. This plasmid, designated pSBE2A, was capable of transforming Escherichia coli and S. bovis strains 12-U-1 and no8 on high frequency; therefore, pSBE2A is an effective shuttle vector.

Journal of Agricultural and Food Chemistry | 2014

In situ label-free visualization of orally dosed strictinin within mouse kidney by MALDI-MS imaging.

Yoonhee Kim; Yoshinori Fujimura; Masako Sasaki; Xue Yang; Daichi Yukihira; Daisuke Miura; Yumi Unno; Koretsugu Ogata; Hiroki Nakajima; Shuya Yamashita; Kanami Nakahara; Motoki Murata; I-Chian Lin; Hiroyuki Wariishi; Koji Yamada; Hirofumi Tachibana

Matrix-assisted laser desorption/ionization-mass spectrometry imaging (MALDI-MSI) is a powerful technique for visualizing the distribution of a wide range of biomolecules within tissue sections. However, methodology for visualizing a bioactive ellagitannin has not yet been established. This paper presents a novel in situ label-free MALDI-MSI technique for visualizing the distribution of strictinin, a bioactive ellagitannin found in green tea, within mammalian kidney after oral dosing. Among nine representative matrix candidates, 1,5-diaminonaphthalene (1,5-DAN), harmane, and ferulic acid showed higher sensitivity to strictinin spotted onto a MALDI sample plate. Of these, 1,5-DAN enables visualization of a two-dimensional image of strictinin directly spotted on mouse kidney sections with the highest sensitivity. Furthermore, 1,5-DAN-based MALDI-MSI could detect the unique distribution of orally dosed strictinin within kidney sections. This in situ label-free imaging technique will contribute to the localization analysis of strictinin and its biological mechanisms.

Current Microbiology | 1998

Transcriptional regulation of the Prevotella ruminicola recA gene

Roustam I. Aminov; Kiyoshi Tajima; Koretsugu Ogata; Takafumi Nagamine; Mutsumi Sugiura; Yoshimi Benno

Abstract. The regulation of the recA gene expression in the obligately anaerobic rumen bacterium Prevotella ruminicola was investigated by monitoring the recA-specific transcript level. P. ruminicola recA forms a monocistronic unit, but no SOS-box sequences resembling those of Escherichia coli or Bacillus subtilis can be identified upstream of the recA coding region. At the same time, we observed a fivefold increase in the level of recA mRNA in response to DNA damaging agents, mitomycin C and methyl methanesulfonate, as well as under conditions of oxidative stress. No induction was detected when growth of P. ruminicola was arrested by shifting to acidic (pH 4.8) conditions. Primer extension experiment revealed the three very close transcriptional start sites for recA. The putative −10 and −35 RNA polymerase binding regions were proposed on the basis of transcript mapping. These regions bear very little similarity to the E. coli (σ70) and B. subtilis (σA) consensus sequences, as well as to the recognition sites of other minor σ-factors. Transcript mapping experiments in E. coli expressing P. ruminicola recA confirmed that the transcription machineries of these two bacteria recognize completely different regulatory sequences on the template to initiate transcription. Preliminary DNase I footprinting analysis data revealed that the region of imperfect dyad symmetry (AATTATAATCAATTATAAAT) found between the putative −10 region and the translation initiation codon may serve as an SOS-box-like regulatory sequence in P. ruminicola. This sequence bears no similarity to the known SOS-box sequences and, in particular, to that of E. coli and other Gram-negative bacteria.

Evolutionary Bioinformatics | 2007

Interactions Between SNP Alleles at Multiple Loci and Variation in Skin Pigmentation in 122 Caucasians

Sumiko Anno; Takashi Abe; Koichi Sairyo; Susumu Kudo; Takushi Yamamoto; Koretsugu Ogata; Vijay K. Goel

Some genomes are known to have incurred a genome doubling (tetraploidization) event in their evolutionary history, and this is reflected today in patterns of duplicated segments scattered throughout their chromosomes. These duplications may be used as data to “halve” the genome, i.e. to reconstruct the an cestral genome at the moment of tetraploidization, but the solution is often highly non-unique. To resolve this problem, we adapt the genome halving algorithm of El-Mabrouk and Sankoff to take account of an external reference genome. We apply this to reconstruct the tetraploid ancestor of maize, using either rice or sorghum as the reference.Circular bacterial chromosomes have highly polarized nucleotide composition in the two replichores, and this genomic strand asymmetry can be visualized using GC skew graphs. Here we propose and discuss the GC skew index (GCSI) for the quantification of genomic compositional skew, which combines a normalized measure of fast Fourier transform to capture the shape of the skew graph and Euclidean distance between the two vertices in a cumulative skew graph to represent the degree of skew. We calculated GCSI for all available bacterial genomes, and GCSI correlated well with the visibility of GC skew. This novel index is useful for estimating confidence levels for the prediction of replication origin and terminus by methods based on GC skew and for measuring the strength of replicational selection in a genome.Background Variable minisatellites count among the most polymorphic markers of eukaryotic and prokaryotic genomes. This variability can affect gene coding regions, like in the prion protein gene, or gene regulation regions, like for the cystatin B gene, and be associated or implicated in diseases: the Creutzfeld-Jakob disease and the myoclonus epilepsy type 1, for our examples. When it affects neutrally evolving regions, the polymorphism in length (i.e., in number of copies) of minisatellites proved useful in population genetics. Motivation In these tandem repeat sequences, different mutational mechanisms let the number of copies, as well as the copies themselves, vary. Especially, the interspersion of events of tandem duplication/contraction and of punctual mutation makes the succession of variant repeats much more informative than the sole allele length. To exploit this information requires the ability to align minisatellite alleles by accounting for both punctual mutations and tandem duplications. Results We propose a minisatellite maps alignment program that improves on previous solutions. Our new program is faster, simpler, considers an extended evolutionary model, and is available to the community. We test it on the data set of 609 alleles of the MSY1 (DYF155S1) human minisatellite and confirm its ability to recover known evolutionary signals. Our experiments highlight that the informativeness of minisatellites resides in their length and composition polymorphisms. Exploiting both simultaneously is critical to unravel the implications of variable minisatellites in the control of gene expression and diseases.Bacterial chromosomes are partly shaped by the functional requirements for efficient replication, which lead to strand bias as commonly characterized by the excess of guanines over cytosines in the leading strand. Gene structures are also highly organized within bacterial genomes as a result of such functional constraints, displaying characteristic positioning and structuring along the genome. Here we analyze the gene structures in completely sequenced bacterial chromosomes to observe the positional constraints on gene orientation, length, and codon usage with regard to the positions of replication origin and terminus. Selection on these gene features is different in regions surrounding the terminus of replication from the rest of the genome, but the selection could be either positive or negative depending on the species, and these positional effects are partly attributed to the A-T enrichment near the terminus. Characteristic gene structuring relative to the position of replication origin and terminus is commonly observed among most bacterial species with circular chromosomes, and therefore we argue that the highly organized gene positioning as well as the strand bias should be considered for genomics studies of bacteria.MBEToolbox is an extensible MATLAB-based software package for analysis of DNA and protein sequences. MBEToolbox version 2.0 includes enhanced functions for phylogenetic analyses by the maximum likelihood method. For example, it is capable of estimating the synonymous and nonsynonymous substitution rates using a novel or several known codon substitution models. MBEToolbox 2.0 introduces new functions for estimating site-specific evolutionary rates by using a maximum likelihood method or an empirical Bayesian method. It also incorporates several different methods for recombination detection. Multi-platform versions of the software are freely available at http://www.bioinformatics.org/mbetoolbox/.Computational prediction of the impact of a mutation on protein function is still not accurate enough for clinical diagnostics without additional human expert analysis. Sequence alignment-based methods have been extensively used but their results highly depend on the quality of the input alignments and the choice of sequences. Incorporating the structural information with alignments improves prediction accuracy. Here, we present a conservation of amino acid properties method for mutation prediction, Multiple Properties Tolerance Analysis (MuTA), and a new strategy, MuTA/S, to incorporate the solvent accessible surface (SAS) property into MuTA. Instead of combining multiple features by machine learning or mathematical methods, an intuitive strategy is used to divide the residues of a protein into different groups, and in each group the properties used is adjusted. The results for LacI, lysozyme, and HIV protease show that MuTA performs as well as the widely used SIFT algorithm while MuTA/S outperforms SIFT and MuTA by 2%–25% in terms of prediction accuracy. By incorporating the SAS term alone, the alignment dependency of overall prediction accuracy is significantly reduced. MuTA/S also defines a new way to incorporate any structural features and knowledge and may lead to more accurate predictions.It has been known for some time that many proteins are marginally stable. This has inspired several explanations. Having noted that the functionality of many enzymes is correlated with subunit motion, flexibility, or general disorder, some have suggested that marginally stable proteins should have an evolutionary advantage over proteins of differing stability. Others have suggested that stability and functionality are contradictory qualities, and that selection for both criteria results in marginally stable proteins, optimised to satisfy the competing design pressures. While these explanations are plausible, recent research simulating the evolution of model proteins has shown that selection for stability, ignoring any aspects of functionality, can result in marginally stable proteins because of the underlying makeup of protein sequence-space. We extend this research by simulating the evolution of proteins, using a computational protein model that equates functionality with binding and catalysis. In the model, marginal stability is not required for ligand-binding functionality and we observe no competing design pressures. The resulting proteins are marginally stable, again demonstrating that neutral evolution is sufficient for explaining marginal stability in observed proteins.Systems biology is a rapidly expanding field that integrates diverse areas of science such as physics, engineering, computer science, mathematics, and biology toward the goal of elucidating the underlying principles of hierarchical metabolic and regulatory systems in the cell, and ultimately leading to predictive understanding of cellular response to perturbations. Because post-genomics research is taking place throughout the tree of life, comparative approaches offer a way for combining data from many organisms to shed light on the evolution and function of biological networks from the gene to the organismal level. Therefore, systems biology can build on decades of theoretical work in evolutionary biology, and at the same time evolutionary biology can use the systems biology approach to go in new uncharted directions. In this study, we present a review of how the post-genomics era is adopting comparative approaches and dynamic system methods to understand the underlying design principles of network evolution and to shape the nascent field of evolutionary systems biology. Finally, the application of evolutionary systems biology to robust biological network designs is also discussed from the synthetic biology perspective.MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.This study was undertaken to clarify the molecular basis for human skin color variation and the environmental adaptability to ultraviolet irradiation, with the ultimate goal of predicting the impact of changes in future environments on human health risk. One hundred twenty-two Caucasians living in Toledo, Ohio participated. Back and cheek skin were assayed for melanin as a quantitative trait marker. Buccal cell samples were collected and used for DNA extraction. DNA was used for SNP genotyping using the Masscode system, which entails two-step PCR amplification and a platform chemistry which allows cleavable mass spectrometry tags. The results show gene-gene interaction between SNP alleles at multiple loci (not necessarily on the same chromosome) contributes to inter-individual skin color variation while suggesting a high probability of linkage disequilibrium. Confirmation of these findings requires further study with other ethic groups to analyze the associations between SNP alleles at multiple loci and human skin color variation. Our overarching goal is to use remote sensing data to clarify the interaction between atmospheric environments and SNP allelic frequency and investigate human adaptability to ultraviolet irradiation. Such information should greatly assist in the prediction of the health effects of future environmental changes such as ozone depletion and increased ultraviolet exposure. If such health effects are to some extent predictable, it might be possible to prepare for such changes in advance and thus reduce the extent of their impact.Darwin’s Principle of Divergence explains sympatric speciation as gradual and directional. Contradicting evidence suggests that species’ traits evolve saltationally. Here, we model coevolution in exploiter-victim systems. Victims (resource population) have heritable, mutable cue phenotypes with different levels of defense. Exploiters have heritable, mutable perceptual phenotypes. Our simulations reveal coevolution of victim mimicry and exploiter specialization in a saltational and reversible cycle. Evolution is gradual and directional only in the specialization phase of the cycle thereby implying that specialization itself is saltational in such systems. Once linked to assortative mating, exploiter specialization provides conditions for speciation.The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR) and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur) binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.The concept of the thrifty phenotype, first proposed by Hales and Barker, is now widely used in medical research, often in contrast to the thrifty genotype model, to interpret associations between early-life experience and adult health status. Several evolutionary models of the thrifty phenotype, which refers to developmental plasticity, have been presented. These include (A) the weather forecast model of Bateson, (B) the maternal fitness model of Wells, (C) the intergenerational phenotypic inertia model of Kuzawa, and (D) the predictive adaptive response model of Gluckman and Hanson. These models are compared and contrasted, in order to assess their relative utility for understanding human ontogenetic development. The most broadly applicable model is model A, which proposes that developing organisms respond to cues of environmental quality, and that mismatches between this forecast and subsequent reality generate significant adverse effects in adult phenotype. The remaining models all address in greater detail what kind of information is provided by such a forecast. Whereas both models B and C emphasise the adaptive benefits of exploiting information about the past, encapsulated in maternal phenotype, model D assumes that the fetus uses cues about the present external environment to predict its probable adult environment. I argue that for humans, with a disproportionately long period between the closing of sensitive windows of plasticity and the attainment of reproductive maturity, backward-looking models B and C represent a better approach, and indicate that the developing offspring aligns itself with stable cues of maternal phenotype so as to match its energy demand with maternal capacity to supply. In contrast, the predictive adaptive response model D over-estimates the capacity of the offspring to predict the future, and also fails to address the long-term parent-offspring dynamics of human development. Differences between models have implications for the design of public health interventions.We present computational methods and subroutines to compute Gaussian quadrature integration formulas for arbitrary positive measures. For expensive integrands that can be factored into well-known forms, Gaussian quadrature schemes allow for efficient evaluation of high-accuracy and -precision numerical integrals, especially compared to general ad hoc schemes. In addition, for certain well-known density measures (the normal, gamma, log-normal, Student’s t, inverse-gamma, beta, and Fisher’s F) we present exact formulae for computing the respective quadrature scheme.Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.A recent editorial in PLoS Biology by MacCallum and Hill (2006) pointed out the inappropriateness of studies evaluating signatures of positive selection based solely in single-site analyses. Therefore the rising number of articles claiming positive selection that have been recently published urges the question of how to improve the bioinformatics standards for reliably unravel positive selection? Deeper integrative efforts using state-of-the-art methodologies at the gene-level and protein-level are improving positive selection studies. Here we provide some computational guidelines to thoroughly document molecular adaptation.The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.Evolutionary Bioinformatics Online enjoyed a busy and productive first year in 2005, capped off by being accepted by the Literature Selection and Technical Review Committee at the National Library of Medicine in the US, for inclusion in PubMed Central. This means that EBO will be indexed online in this internationally preeminent archive. A brief history. Evolutionary Bioinformatics Online was established as the official journal of The Bioinformatics Institute, a joint-venture between the University of Auckland, situated in New Zealand’s largest city, and AgResearch, New Zealand’s largest Crown Research Institute. Allen Rodrigo, Professor of Computational Biology and Bioinformatics, is the Institute’s Director, and it was at his initiative that the journal was established. Rodrigo, The Institute, and I all work with Libertas Academica, a publishing firm committed to high editorial standards and open access publishing methodologies, to produce the journal. We are fortunate for the help of an impressive editorial board (http://www.la-press.com/EBO-edboard.htm), and our acceptance into PubMed Central in our first year reflects the board’s reputation and high standard of submissions accepted for publication. The contents of our Volume 1 gives some views of where our field is now: papers on phylogenetics, genetic databases and software for managing and exploiting them, gene regulation, and proteomics. These papers show how evolutionary biologists are working with researchers in these areas to begin to produce working pictures of how genomes and gene-regulation evolve. This situation will only grow more complex as more and more species are characterised at the whole-genome level. New architectures and technologies for building very large trees — grid computing, supertrees, and parallel algorithms for tree building – will be needed, as will software for conducting comparisons among them. Looking ahead what topics might come to dominate future volumes of the journal? My sense is that from the beginning just described, a big challenge for evolutionary bioinformatics is to begin to unravel the evolution of phenotypes. This means understanding networks of genes, their topological properties, how the networks evolve, how their genes are regulated and, finally, how these wired-up and regulated networks produce the phenotypes we see. This is the real evo-devo (evolutionary developmental biology) we have been waiting for. It will depend on good phylogenies, good models of sequence and protein evolution and on the efforts of the armies of people annotating genes, and studying their expression with such technologies as microarrays or RNAi. The attractiveness of this view for evolutionary biologists is that it is so well suited to the many other things we do. Evolutionary biologists have spent almost the last 150 years studying history. It may be time to shift to “rules” and “laws” of evolution – what can we predict will evolve, or at least what can we predict about how things will evolve? Given some understanding of how phenotypes map onto genotypes, and how genotypes evolve will underpin progress in this area, and may give insight into why some attempts at prediction based solely on sequence – such as those for the influenza virus – have had limited success. Astrobiologists might wish to pay attention: how organisms work on this planet and why, should give some insights into how they might be put together in different environments. This knowledge could prove useful in the design of sensors to detect life. New sequencing technologies are going to see the field of population genomics emerge – large samples of genomes from individuals in a population. Microbiologists, note that you may finally be able to test just how promiscuous microbes are! Going a step deeper, environmental sequencing promises (threatens?!) to identify hundreds of thousands or perhaps millions of new genes: Buzz Lightyear’s aphorism “to infinity and beyond” may be more true than we care to imagine just now. It is going to take creative thinking and methods to help to assemble these environmental samples into putative genomes and species. I say ‘threaten’ above because already there is something of a ‘functional genomics gap’ between the genes that have been identified and our knowledge of what they do. Coalescent modeling combined with the analysis of selection among populations can bring the understanding of phenotypes right down to the level of variation within populations. It can also identify important genes associated with differential survival in varying habitats, conferring disease resistance, or associated with recovery from environmental shock. This kind of population level thinking combined with phenotypes may also help to identify principles of cross-species transmission and the spread of infectious agents within and between populations. For example, there has been some progress in identifying how the protein network of an infectious agent merges topologically with its host’s protein network. The promising conclusion from this is that there is far more for evolutionary biologists and bioinformatics researchers to do than they possibly can. Equally encouraging is that the issues this work confronts are some of the most pressing we have: aging populations will mean that understanding phenotypes and how to alter them could be valuable in improving quality of life and saving money for health care systems; understanding how populations evolve and adapt, and linking phenotypes to environmental change may prove useful as we seemingly sleep-walk into global warming; and as the current vigilance about ‘bird flu’ reminds us, even in our high-tech and sanitized modern era, infectious diseases still kill more people on the planet every year than any other cause. Elsewhere (Pagel, 2002) I have suggested that biology is the physics of the 21st century, capturing the public’s attention and garnering a large share of government and private funding for science. These major issues confronting humanity, all with a biological content, show why.Codon adaptation index is a widely used index for characterizing gene expression in general and translation efficiency in particular. Current computational implementations have a number of problems leading to various systematic biases. I illustrate these problems and provide a better computer implementation to solve these problems. The improved CAI can predict protein production better than CAI from other commonly used implementations.FMDV virus has been increasingly recognised as the most economically severe animal virus with a remarkable degree of antigenic diversity. Using an integrative evolutionary and computational approach we have compelling evidence for heterogeneity in the selection forces shaping the evolution of the seven different FMDV serotypes. Our results show that positive Darwinian selection has governed the evolution of the major antigenic regions of serotypes A, Asia1, O, SAT1 and SAT2, but not C or SAT3. Co-evolution between sites from antigenic regions under positive selection pinpoints their functional communication to generate immune-escape mutants while maintaining their ability to recognise the host-cell receptors. Neural network and functional divergence analyses strongly point to selection shifts between the different serotypes. Our results suggest that, unlike African FMDV serotypes, serotypes with wide geographical distribution have accumulated compensatory mutations as a strategy to ameliorate the effect of slightly deleterious mutations fixed by genetic drift. This strategy may have provided the virus by a flexibility to generate immune-escape mutants and yet recognise host-cell receptors. African serotypes presented no evidence for compensatory mutations. Our results support heterogeneous selective constraints affecting the different serotypes. This points to the possible accelerated rates of evolution diverging serotypes sharing geographical locations as to ameliorate the competition for the host.Using the structured serial coalescent with Bayesian MCMC and serial samples, we estimate population size when some demes are not sampled or are hidden, ie ghost demes. It is found that even with the presence of a ghost deme, accurate inference was possible if the parameters are estimated with the true model. However with an incorrect model, estimates were biased and can be positively misleading. We extend these results to the case where there are sequences from the ghost at the last time sample. This case can arise in HIV patients, when some tissue samples and viral sequences only become available after death. When some sequences from the ghost deme are available at the last sampling time, estimation bias is reduced and accurate estimation of parameters associated with the ghost deme is possible despite sampling bias. Migration rates for this case are also shown to be good estimates when migration values are low.Coalescent theory is a powerful tool for population geneticists as well as molecular biologists interested in understanding the patterns and levels of DNA variation. Using coalescent Monte Carlo simulations it is possible to obtain the empirical distributions for a number of statistics across a wide range of evolutionary models; these distributions can be used to test evolutionary hypotheses using experimental data. The mlcoalsim application presented here (based on a version of the ms program, Hudson, 2002) adds important new features to improve methodology (uncertainty and conditional methods for mutation and recombination), models (including strong positive selection, finite sites and heterogeneity in mutation and recombination rates) and analyses (calculating a number of statistics used in population genetics and P-values for observed data). One of the most important features of mlcoalsim is the analysis of multilocus data in linked and independent regions. In summary, mlcoalsim is an integrated software application aimed at researchers interested in molecular evolution. mlcoalsim is written in ANSI C and is available at: http://www.ub.es/softevol/mlcoalsim.We study the small phylogeny problem in the space of multichromosomal genomes under the double cut and join metric. This is similar to the existing MGR (multiple genome rearrangements) approach but it allows, in addition to inversion and reciprocal translocation, operations of transposition and block interchange. Empirically, with chloroplast and mammalian data sets, it finds solutions as good as or better than MGR when the latter operations are prohibited. Permitting these operations allows quantitatively better solutions where part of the reconstructed ancestral genomes may be included in circular chromosomes. We discuss the biological likelihood of transpositions and block interchanges in the mammalian data.Introns are now commonly used in molecular phylogenetics in an attempt to recover gene trees that are concordant with species trees, but there are a range of genomic, logistical and analytical considerations that are infrequently discussed in empirical studies that utilize intron data. This review outlines expedient approaches for locus selection, overcoming paralogy problems, recombination detection methods and the identification and incorporation of LVHs in molecular systematics. A range of parsimony and Bayesian analytical approaches are also described in order to highlight the methods that can currently be employed to align sequences and treat indels in subsequent analyses. By covering the main points associated with the generation and analysis of intron data, this review aims to provide a comprehensive introduction to using introns (or any non-coding nuclear data partition) in contemporary phylogenetics.In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development.The MOB family includes a group of cell cycle-associated proteins highly conserved throughout eukaryotes, whose founding members are implicated in mitotic exit and co-ordination of cell cycle progression with cell polarity and morphogenesis. Here we report the characterization and evolution of the MOB domain-containing proteins as inferred from the 43 eukaryotic genomes so far sequenced. We show that genes for Mob-like proteins are present in at least 41 of these genomes, confirming the universal distribution of this protein family and suggesting its prominent biological function. The phylogenetic analysis reveals five distinct MOB domain classes, showing a progressive expansion of this family from unicellular to multicellular organisms, reaching the highest number in mammals. Plant Mob genes appear to have evolved from a single ancestor, most likely after the loss of one or more genes during the early stage of Viridiplantae evolutionary history. Three of the Mob classes are widespread among most of the analyzed organisms. The possible biological and molecular function of Mob proteins and their role in conserved signaling pathways related to cell proliferation, cell death and cell polarity are also presented and critically discussed.The non-homogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191–210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two data sets, each violating one or more of the assumptions of stationarity, homogeneity, and reversibility. The log likelihood values returned by programs based on the F84 model (J Mol Evol, 29: 170–179), the general time reversible model (J Mol Evol, 20: 86–93), and Barry and Hartigan’s model are compared to determine the validity of the assumptions made by the first two models. In addition, we present a method for assessing whether sequences have evolved under reversible conditions and discover that this is not so for the two data sets. Finally, we determine the most likely tree under the three models of DNA evolution and compare these with the one favoured by the tests for symmetry.Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike’s Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage.Biodiversity assessment demands objective measures, because ultimately conservation decisions must prioritize the use of limited resources for preserving taxa. The most general framework for the objective assessment of conservation worth are those that assess evolutionary distinctiveness, e.g. Genetic (Crozier 1992) and Phylogenetic Diversity (Faith 1992), and Evolutionary History (Nee & May 1997). These measures all attempt to assess the conservation worth of any scheme based on how much of the encompassing phylogeny of organisms is preserved. However, their general applicability is limited by the small proportion of taxa that have been reliably placed in a phylogeny. Given that phylogenizaton of many interesting taxa or important is unlikely to occur soon, we present a framework for using taxonomy as a reasonable surrogate for phylogeny. Combining this framework with exhaustive searches for combinations of sites containing maximal diversity, we provide a proof-of-concept for assessing conservation schemes for systematized but un-phylogenised taxa spread over a series of sites. This is illustrated with data from four studies, on North Queensland flightless insects (Yeates et al. 2002), ants from a Florida Transect (Lubertazzi & Tschinkel 2003), New England bog ants (Gotelli & Ellison 2002) and a simulated distribution of the known New Zealand Lepidosauria (Daugherty et al. 1994). The results support this approach, indicating that species, genus and site numbers predict evolutionary history, to a degree depending on the size of the data set.Continuously varying traits such as body size or gene expression level evolve during the history of species or gene lineages. To test hypotheses about the evolution of such traits, the maximum likelihood (ML) method is often used. Here we introduce CoMET (Continuous-character Model Evaluation and Testing), which is module for Mesquite that automates likelihood computations for nine different models of trait evolution. Due to its few restrictions on input data, CoMET is applicable to testing a wide range of character evolution hypotheses. The CoMET homepage, which links to freely available software and more detailed usage instructions, is located at http://www.lifesci.ucsb.edu/eemb/labs/oakley/software/comet.htm.A novel high throughput phylogenomic analysis (HTP) was applied to the rhodopsin G-protein coupled receptor (GPCR) family. Instances of phylogenetic mosaicism between receptors were found to be frequent, often as instances of correlated mosaicism and repeated mosaicism. A null data set was constructed with the same phylogenetic topology as the rhodopsin GPCRs. Comparison of the two data sets revealed that mosaicism was found in GPCRs in a higher frequency than would be expected by homoplasy or the effects of topology alone. Various evolutionary models of differential conservation, recombination and homoplasy are explored which could result in the patterns observed in this analysis. We find that the results are most consistent with frequent recombination events. A complex evolutionary history is illustrated in which it is likely frequent recombination has endowed GPCRs with new functions. The pattern of mosaicism is shown to be informative for functional prediction for orphan receptors. HTP analysis is complementary to conventional phylogenomic analyses revealing mosaicism that would not otherwise have been detectable through conventional phylogenetics.Many kinds of microevolutionary studies require data on multiple polymorphisms in multiple populations. Increasingly, and especially for human populations, multiple research groups collect relevant data and those data are dispersed widely in the literature. ALFRED has been designed to hold data from many sources and make them available over the web. Data are assembled from multiple sources, curated, and entered into the database. Multiple links to other resources are also established by the curators. A variety of search options are available and additional geographic based interfaces are being developed. The database can serve the human anthropologic genetic community by identifying what loci are already typed on many populations thereby helping to focus efforts on a common set of markers. The database can also serve as a model for databases handling similar DNA polymorphism data for other species.The Minimum Error Correction (MEC) is an important model for haplotype reconstruction from SNP fragments. However, this model is effective only when the error rate of SNP fragments is low. In this paper, we propose a new computational model called Minimum Conflict Individual Haplotyping (MCIH) as an extension to MEC. In contrast to the conventional approaches, the new model employs SNP fragment information and also related genotype information, thereby a high accurate inference can be expected. We first prove the MCIH problem to be NP-hard. To evaluate the practicality of the new model we design an exact algorithm (a dynamic programming procedure) to implement MCIH on a special data structure. The numerical experience indicates that it is fairly effective to use MCIH at the cost of related genotype information, especially in the case of SNP fragments with a high error rate. Moreover, we present a feed-forward neural network algorithm to solve MCIH for general data structure and large size instances. Numerical results on real biological data and simulation data show that the algorithm works well and MCIH is a potential alternative in individual haplotyping.Details of the genomic changes that occurred in the ancestors of Eukarya, Archaea and Bacteria are elusive. Ancient interdomain horizontal gene transfer (IDHGT) amongst the ancestors of these three domains has been difficult to detect and analyze because of the extreme degree of divergence of genes in these three domains and because most evidence for such events are poorly supported. In addition, many researchers have suggested that the prevalence of IDHGT events early in the evolution of life would most likely obscure the patterns of divergence of major groups of organisms let alone allow the tracking of horizontal transfer at this level. In order to approach this problem, we mined the E. coli genome for genes with distinct paralogs. Using the 1,268 E. coli K-12 genes with 40% or higher similarity level to a paralog elsewhere in the E. coli genome we detected 95 genes found exclusively in Bacteria and Archaea and 86 genes found in Bacteria and Eukarya. These genes form the basis for our analysis of IDHGT. We also applied a newly developed statistical test (the node height test), to examine the robustness of these inferences and to corroborate the phylogenetically identified cases of ancient IDHGT. Our results suggest that ancient inter domain HGT is restricted to special cases, mostly involving symbiosis in eukaryotes and specific adaptations in prokaryotes. Only three genes in the Bacteria + Eukarya class (Deoxyxylulose-5-phosphate synthase (DXPS), fructose 1,6-phosphate aldolase class II protein and glucosamine-6-phosphate deaminase) and three genes–in the Bacteria + Archaea class (ABC-type FE3+-siderophore transport system, ferrous iron transport protein B, and dipeptide transport protein) showed evidence of ancient IDHGT. However, we conclude that robust estimates of IDHGT will be very difficult to obtain due to the methodological limitations and the extreme sequence saturation of the genes suspected of being involved in IDHGT.Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations.Glucose transporters (GLUT) are twelve-transmembrane spanning proteins that contain two pores capable of transporting glucose and dehydroascorbate in and out of cells. The mechanism by which transport is effected is unknown. An evolutionarily-based hypothesis for the mechanism of glucose transport is presented here based on reports that insulin has multiple binding sites for glucose. It is proposed that insulin-like peptides were incorporated as modular elements into transmembrane proteins during evolution, resulting in glucose transporting capacity. Homology searching reveals that all GLUT contain multiple copies of insulin-like regions. These regions map onto a model of GLUT in positions that define the glucose transport cores. This observation provides a mechanism for glucose transport involving the diffusion of glucose from one insulin-like glucose-binding region to another. It also suggests a mechanism by which glucose disregulation may occur in both type 1 and type 2 diabetes: insulin rapidly self-glycates under hyperglycemic conditions. Insulin-like regions of GLUT may also self-glycate rapidly, thereby interfering with transport of glucose into cells and disabling GLUT sensing of blood glucose levels. All aspects of the hypothesis are experimentally testable.In comparative genomic studies, syntenic groups of homologous sequence in the same order have been used as supplementary information that can be used in helping to determine the orthology of the compared sequences. The assumption is that ortholo-gous gene copies are more likely to share the same genome positions and share the same gene neighbors. In this study we have defined positional homologs as those that also have homologous neighboring genes and we investigated the usefulness of this distinction for bacterial comparative genomics. We considered the identification of positionaly homologous gene pairs in bacterial genomes using protein and DNA sequence level alignments and found that the positional homologs had on average relatively lower rates of substitution at the DNA level (synonymous substitutions) than duplicate homologs in different genomic locations, regardless of the level of protein sequence divergence (measured with non-synonymous substitution rate). Since gene order conservation can indicate accuracy of orthology assignments, we also considered the effect of imposing certain alignment quality requirements on the sensitivity and specificity of identification of protein pairs by BLAST and FASTA when neighboring information is not available and in comparisons where gene order is not conserved. We found that the addition of a stringency filter based on the second best hits was an efficient way to remove dubious ortholog identifications in BLAST and FASTA analyses. Gene order conservation and DNA sequence homology are useful to consider in comparative genomic studies as they may indicate different orthology assignments than protein sequence homology alone.Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.Protein interactions are an important resource to obtain an understanding of cell function. Recently, researchers have compared networks of interactions in order to understand network evolution. While current methods first infer homologs and then compare topologies, we here present a method which first searches for interesting topologies and then looks for homologs. PINA (protein interaction network analysis) takes the protein interaction networks of two organisms, scans both networks for subnetworks deemed interesting, and then tries to find orthologs among the interesting subnetworks. The application is very fast because orthology investigations are restricted to subnetworks like hubs and clusters that fulfill certain criteria regarding neighborhood and connectivity. Finally, the hubs or clusters found to be related can be visualized and analyzed according to protein annotation.Researchers routinely adopt molecular clock assumptions in conducting sequence analyses to estimate dates for viral origins in humans. We used computational methods to examine the extent to which this practice can result in inaccurate ‘retrodiction.’ Failing to account for dynamic molecular evolution can affect greatly estimating index case dates, resulting in an overestimated age for the SARS-CoV-human infection, for instance.We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf + py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.We characterized the variation in the reconstructed ancestor of 118 HIV-1 envelope gene sequences arising from the methods used for (a) estimating and (b) rooting the phylogenetic tree, and (c) reconstructing the ancestor on that tree, from (d) the sequence format, and from (e) the number of input sequences. The method of rooting the tree was responsible for most of the sequence variation both among the reconstructed ancestral sequences and between the ancestral and observed sequences. Variation in predicted 3-D structural properties of the ancestors mirrored their sequence variation. The observed sequence consensus and ancestral sequences from center-rooted trees were most similar in all predicted attributes. Only for the predicted number of N-glycosylation sites was there a difference between MP and ML methods of reconstruction. Taxon sampling effects were observed only for outgroup-rooted trees, not center-rooted, reflecting the occurrence of several divergent basal sequences. Thus, for sequences exhibiting a radial phylogenetic tree, as does HIV-1, most of the variation in the estimated ancestor arises from the method of rooting the phylogenetic tree. Those investigating the ancestors of genes exhibiting such a radial tree should pay particular attention to alternate rooting methods in order to obtain a representative sample of ancestors.I show several types of topological biases in distance-based methods that use the least-squares method to evaluate branch lengths and the minimum evolution (ME) or the Fitch-Margoliash (FM) criterion to choose the best tree. For a 6-species tree, there are two tree shapes, one with three cherries (a cherry is a pair of adjacent leaves descending from the most recent common ancestor), and the other with two. When genetic distances are underestimated, the 3-cherry tree shape is favored with either the ME or FM criterion. When the genetic distances are overestimated, the ME criterion favors the 2-cherry tree, but the direction of bias with the FM criterion depends on whether negative branches are allowed, i.e. allowing negative branches favors the 3-cherry tree shape but disallowing negative branches favors the 2-cherry tree shape. The extent of the bias is explored by computer simulation of sequence evolution.The nuclear pore complex (NPC) facilitates transport between nucleus and cytoplasm. The protein constituents of the NPC, termed nucleoporins (Nups), are conserved across a wide diversity of eukaryotes. In apparent exception to this, no nucleoporin genes have been identified in nucleomorph genomes. Nucleomorphs, nuclear remnants of once free-living eukaryotes, took up residence as secondary endosymbionts in cryptomonad and chlorarachniophyte algae. As these genomes are highly reduced, Nup genes may have been lost, or relocated to the host nucleus. However, Nup genes are often poorly conserved between species, so absence may be an artifact of low sequence similarity. We therefore constructed an evolutionary bioinformatic screen to establish whether the apparent absence of Nup genes in nucleomorph genomes is due to genuine absence or the inability of current methods to detect homologues. We searched green plant (Arabidopsis and rice), green alga (Chlamydomonas reinhardtii) and red alga (Cyanidioschyzon merolae) genomes, plus two nucleomorph genomes (Bigelowiella natans and Guillardia theta) with profile hidden Markov models (HMMs) from curated alignments of known vertebrate/yeast Nups. Since the plant, algal and nucleomorph genomes all belong to the kingdom Plantae, and are evolutionarily distant from the outgroup (vertebrate/yeast) training set, we use the plant and algal genomes as internal positive controls for the sensitivity of the searches in nucleomorph genomes. We find numerous Nup homologues in all plant and free-living algal species, but none in either nucleomorph genome. BLAST searches using identified plant and algal Nups also failed to detect nucleomorph homologues. We conclude that nucleomorph Nup genes have either been lost, being replaced by host Nup genes, or, that nucleomorph Nup genes have been transferred to the host nucleus twice independently; once in the evolution of the red algal nucleomorph and once in the green algal nucleomorph.Biochemical networks are the backbones of physiological systems of organisms. Therefore, a biochemical network should be sufficiently robust (not sensitive) to tolerate genetic mutations and environmental changes in the evolutionary process. In this study, based on the robustness and sensitivity criteria of biochemical networks, the adaptive design rules are developed for natural selection in the evolutionary process. This will provide insights into the robust adaptive mechanism of biochemical networks in the evolutionary process. We find that if a mutated biochemical network satisfies the robustness and sensitivity criteria of natural selection, there is a high probability for the biochemical network to prevail during natural selection in the evolutionary process. Since there are various mutated biochemical networks that can satisfy these criteria but have some differences in phenotype, the biochemical networks increase their diversities in the evolutionary process. The robustness of a biochemical network enables co-option so that new phenotypes can be generated in evolution. The proposed robust adaptive design rules of natural selection gain much insight into the evolutionary mechanism and provide a systematic robust biochemical circuit design method of biochemical networks for biotechnological and therapeutic purposes in the future.Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.Many of the estimated topologies in phylogenetic studies are presented with the bootstrap support for each of the splits in the topology indicated. If phylogenetic estimation is unbiased, high bootstrap support for a split suggests that there is a good deal of certainty that the split actually is present in the tree and low bootstrap support suggests that one or more of the taxa on one side of the estimated split might in reality be located with taxa on the other side. In the latter case the follow-up questions about how many and which of the taxa could reasonably be incorrectly placed as well as where they might alternatively be placed are not addressed through the presented bootstrap support. We present here an algorithm that finds the set of all trees with minimum bootstrap support for their splits greater than some given value. The output is a ranked list of trees, ranked according to the minimum bootstrap supports for splits in the trees. The number of such trees and their topologies provides useful supplementary information in bootstrap analyses about the reasons for low bootstrap support for splits. We also present ways of quantifying low bootstrap support by considering the set of all topologies with minimum bootstrap greater than some quantity as providing a confidence region of topologies. Using a double bootstrap we are able to choose a cutoff so that the set of topologies with minimum bootstrap support for a split greater than that cutoff gives an approximate 95% confidence region. As with bootstrap support one advantage of the methods is that they are generally applicable to the wide variety of phylogenetic estimation methods.Motivation: Although a great deal of progress is being made in the development of fast and reliable experimental techniques to extract genome-wide networks of protein-protein and protein-DNA interactions, the sequencing of new genomes proceeds at an even faster rate. That is why there is a considerable need for reliable methods of in-silico prediction of protein interaction based solely on sequence similarity information and known interactions from well-studied organisms. This problem can be solved if a dependency exists between sequence similarity and the conservation of the proteins’ functions. Results: In this paper, we introduce a novel probabilistic method for prediction of protein-protein interactions using a new empirical probabilistic formula describing the loss of interactions between homologous proteins during the course of evolution. This formula describes an evolutional process quite similar to the process of the Earth’s population growth. In addition, our method favors predictions confirmed by several interacting pairs over predictions coming from a single interacting pair. Our approach is useful in working with “noisy” data such as those coming from high-throughput experiments. We have generated predictions for five “model” organisms: H. sapiens, D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae and evaluated the quality of these predictions.The General Time Reversible (GTR) model of nucleotide substitution is at the core of many distance-based and character-based phylogeny inference methods. The procedure described by Waddell and Steel (1997), for estimating distances and instantaneous substitution rate matrices, R, under the GTR model, is known to be inapplicable under some conditions, ie, it leads to the inapplicability of the GTR model. Here, we simulate the evolution of DNA sequences along 12 trees characterized by different combinations of tree length, (non-)homogeneity of the substitution rate matrix R, and sequence length. We then evaluate both the frequency of the GTR model inapplicability for estimating distances and the accuracy of inferred alignments. Our results indicate that, inapplicability of the Waddel and Steel’s procedure can be considered a real practical issue, and illustrate that the probability of this inapplicability is a function of substitution rates and sequence length. We also discuss the implications of our results on the current implementations of maximum likelihood and Bayesian methods.The utility of the matrix representation with flipping (MRF) supertree method has been limited by the speed of its heuristic algorithms. We describe a new heuristic algorithm for MRF supertree construction that improves upon the speed of the previous heuristic by a factor of n (the number of taxa in the supertree). This new heuristic makes MRF tractable for large-scale supertree analyses and allows the first comparisons of MRF with other supertree methods using large empirical data sets. Analyses of three published supertree data sets with between 267 to 571 taxa indicate that MRF supertrees are equally or more similar to the input trees on average than matrix representation with parsimony (MRP) and modified min-cut supertrees. The results also show that large differences may exist between MRF and MRP supertrees and demonstrate that the MRF supertree method is a practical and potentially more accurate alternative to the nearly ubiquitous MRP supertree method.The ATP binding cassette containing transporters are a superfamily of integral membrane proteins that translocate a wide range of substrates. The subfamily B members include the biologically important multidrug resistant (MDR) protein and the transporter associated with antigen processing (TAP) complex. Substrates translocated by this subfamily include drugs, lipids, peptides and iron. We have constructed a comprehensive set of comparative models for the transporters from eukaryotes and used these to study the effects of sequence divergence on the substrate translocation pathway. Notably, there is very little structural divergence between the bacterial template structure and the more distantly related eukaryotic proteins illustrating a need to conserve transporter structure. By contrast different properties have been adopted for the translocation pathway depending on the substrate type. A greater level of divergence in electrostatic properties is seen with transporters that have a broad substrate range both within and between species, while a high level of conservation is observed when the substrate range is narrow. This study represents the first effort towards understanding effect of evolution on subfamily B ABC transporters in the context of protein structure and biophysical properties.Matrix representation with parsimony (MRP) can be used to combine trees in the supertree or the consensus settings. However, despite its popularity, it is still unclear whether MRP is really a consensus method or whether it behaves more like the total evidence approach. Previous simulations have shown that it approximates total evidence trees, whereas other studies have depicted similarities with average consensus trees. In this paper, we assess the hypothesis that MRP is equally related to both approaches. We conducted a simulation study to evaluate the accuracy of total evidence with that or various consensus methods, including MRP. Our results show that the total evidence trees are not significantly more accurate than average consensus trees that accounts for branch lengths, but that both perform better than MRP trees in the consensus setting. The accuracy rate of all methods was similarly affected by the number of taxa, the number of partitions, and the heterogeneity of the data.Rates of species origination and extinction can vary over time during evolutionary radiations, and it is possible to reconstruct the history of diversification using molecular phylogenies of extant taxa only. Maximum likelihood methods provide a useful framework for inferring temporal variation in diversification rates. LASER is a package for the R programming environment that implements maximum likelihood methods based on the birth-death process to test whether diversification rates have changed over time. LASER contrasts the likelihood of phylogenetic data under models where diversification rates have changed over time to alternative models where rates have remained constant over time. Major strengths of the package include the ability to detect temporal increases in diversification rates and the inference of diversification parameters under multiple rate-variable models of diversification. The program and associated documentation are freely available from the R package archive at http://cran.r-project.org.A recent paper in this journal (Faith and Baker, 2006) described bio-informatics challenges in the application of the PD (phylogenetic diversity) measure of Faith (1992a), and highlighted the use of the root of the phylogenetic tree, as implied by the original definition of PD. A response paper (Crozier et al. 2006) stated that 1) the (Faith, 1992a) PD definition did not include the use of the root of the tree, and 2) Moritz and Faith (1998) changed the PD definition to include the root. Both characterizations are here refuted. Examples from Faith (1992a,Faith 1992b) document the link from the definition to the use of the root of the overall tree, and a survey of papers over the past 15 years by Faith and colleagues demonstrate that the stated PD definition has remained the same as that in the original 1992 study. PD’s estimation of biodiversity at the level of “feature diversity” is seen to have provided the original rationale for the measure’s consideration of the root of the phylogenetic tree.The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

Journal of Agricultural and Food Chemistry | 2017

Multi-imaging of Cytokinin and Abscisic Acid on the Roots of Rice (Oryza sativa) Using Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry

Katsuhiro Shiono; Riho Hashizaki; Toyofumi Nakanishi; Tatsuko Sakai; Takushi Yamamoto; Koretsugu Ogata; Ken-ichi Harada; Hajime Ohtani; Hajime Katano; Shu Taira

Plant hormones act as important signaling molecules that regulate responses to abiotic stress as well as plant growth and development. Because their concentrations of hormones control the physiological responses in the target tissue, it is important to know the distributions and concentrations in the tissues. However, it is difficult to determine the hormone concentration on the plant tissue as a result of the limitations of conventional methods. Here, we report the first multi-imaging of two plant hormones, one of cytokinin [i.e., trans-zeatin (tZ)] and abscisic acid (ABA) using a new technology, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) imaging. Protonated signals of tZ (m/z 220.1) and ABA (m/z 265.3) were chosen on longitudinal sections of rice roots for MS imaging. tZ was broadly distributed about 40 mm behind the root apex but was barely detectable at the apex, whereas ABA was mainly detected at the root apex. Multi-imaging using MALDI-TOF-MS enabled the visualization of the localization and quantification of plant hormones. Thus, this tool is applicable to a wide range of plant species growing under various environmental conditions.

Explore More