Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kazutaka Katoh is active.

Publication


Featured researches published by Kazutaka Katoh.


Methods of Molecular Biology | 2009

Multiple alignment of DNA sequences with MAFFT.

Kazutaka Katoh; George Asimenos; Hiroyuki Toh

Multiple alignment of DNA sequences is an important step in various molecular biological analyses. As a large amount of sequence data is becoming available through genome and other large-scale sequencing projects, scalability, as well as accuracy, is currently required for a multiple sequence alignment (MSA) program. In this chapter, we outline the algorithms of an MSA program MAFFT and provide practical advice, focusing on several typical situations a biologist sometimes faces. For genome alignment, which is beyond the scope of MAFFT, we introduce two tools: TBA and MAUVE.


Bioinformatics | 2010

Parallelization of the MAFFT multiple sequence alignment program

Kazutaka Katoh; Hiroyuki Toh

Summary: Multiple sequence alignment (MSA) is an important step in comparative sequence analyses. Parallelization is a key technique for reducing the time required for large-scale sequence analyses. The three calculation stages, all-to-all comparison, progressive alignment and iterative refinement, of the MAFFT MSA program were parallelized using the POSIX Threads library. Two natural parallelization strategies (best-first and simple hill-climbing) were implemented for the iterative refinement stage. Based on comparisons of the objective scores and benchmark scores between the two approaches, we selected a simple hill-climbing approach as the default. Availability: The parallelized version of MAFFT is available at http://mafft.cbrc.jp/alignment/software/. This version currently supports the Linux operating system only. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2008

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

Kazutaka Katoh; Hiroyuki Toh

BackgroundStructural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized.ResultsWe took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage.ConclusionThe BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.


Briefings in Bioinformatics | 2017

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

Kazutaka Katoh; John Rozewicki; Kazunori Yamada

Abstract This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.


Journal of Molecular Evolution | 1999

Monophyly of lampreys and hagfishes supported by nuclear DNA-coded genes.

Shigehiro Kuraku; Daisuke Hoshiyama; Kazutaka Katoh; Hiroshi Suga; Takashi Miyata

Abstract. The phylogenetic position of hagfishes in vertebrate evolution is currently controversial. The 18S and 28S rRNA trees support the monophyly of hagfishes and lampreys. In contrast, the mitochondrial DNAs suggest the close association of lampreys and gnathostomes. To clarify this controversial issue, we have conducted cloning and sequencing of the four nuclear DNA–coded single-copy genes encoding the triose phosphate isomerase, calreticulin, and the largest subunit of RNA polymerase II and III. Based on these proteins, together with the Mn superoxide dismutase for which hagfish and lamprey sequences are available in database, phylogenetic trees have been inferred by the maximum likelihood (ML) method of protein phylogeny. It was shown that all the five proteins prefer the monophyletic tree of cyclostomes, and the total log-likelihood of the five proteins significantly supports the cyclostome monophyly at the level of ±1 SE. The ML trees of aldolase family comprising three nonallelic isoforms and the complement component group comprising C3, C4, and C5, both of which diverged during vertebrate evolution by gene duplications, also suggest the cyclostome monophyly.


BMC Biology | 2004

Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

Kanae Kikugawa; Kazutaka Katoh; Shigehiro Kuraku; Hiroshi Sakurai; Osamu Ishida; Naoyuki Iwabe; Takashi Miyata

BackgroundPhylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates.ResultsWe have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method.ConclusionCartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely related to ray-finned fishes than to lungfishes.


Bioinformatics | 2012

Adding unaligned sequences into an existing alignment using MAFFT and LAST

Kazutaka Katoh; Martin C. Frith

Two methods to add unaligned sequences into an existing multiple sequence alignment have been implemented as the ‘–add’ and ‘–addfragments’ options in the MAFFT package. The former option is a basic one and applicable only to full-length sequences, whereas the latter option is applicable even when the unaligned sequences are short and fragmentary. These methods internally infer the phylogenetic relationship among the sequences in the existing alignment and the phylogenetic positions of unaligned sequences. Benchmarks based on two independent simulations consistently suggest that the “–addfragments” option outperforms recent methods, PaPaRa and PAGAN, in accuracy for difficult problems and that these three methods appropriately handle easy problems. Availability: http://mafft.cbrc.jp/alignment/software/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online


Nucleic Acids Research | 2013

aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

Shigehiro Kuraku; Christian M. Zmasek; Osamu Nishimura; Kazutaka Katoh

We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.


Journal of Molecular Evolution | 2001

Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny

Kazutaka Katoh; Kei-ichi Kuma; Takashi Miyata

Abstract. A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA.


Bioinformatics | 2007

PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences

Kazutaka Katoh; Hiroyuki Toh

MOTIVATION To construct a multiple sequence alignment (MSA) of a large number (> approximately 10,000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. RESULTS To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align approximately 60,000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. AVAILABILITY The present algorithm has been implemented in the MAFFT sequence alignment package (http://align.bmr.kyushu-u.ac.jp/mafft/software/). SUPPLEMENTARY INFORMATION Supplementary information is available at Bioinformatics online.

Collaboration


Dive into the Kazutaka Katoh's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kazunori Yamada

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge