Ari Löytynoja
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ari Löytynoja.
Science | 2008
Ari Löytynoja; Nick Goldman
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion–event histories. We present a method that prevents these systematic errors by recognizing insertions and deletions as distinct evolutionary events. We show theoretically and practically that this improves the quality of sequence alignments and downstream analyses over a wide range of realistic alignment problems. These results suggest that insertions and sequence turnover are more common than is currently thought and challenge the conventional picture of sequence evolution and mechanisms of functional and structural changes.
BMC Bioinformatics | 2010
Ari Löytynoja; Nick Goldman
BackgroundPhylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites.ResultsWe have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to de novo alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at http://tinyurl.com/webprank .ConclusionsThe webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.
Bioinformatics | 2003
Ari Löytynoja; Michel C. Milinkovitch
MOTIVATION Progressive algorithms are widely used heuristics for the production of alignments among multiple nucleic-acid or protein sequences. Probabilistic approaches providing measures of global and/or local reliability of individual solutions would constitute valuable developments. RESULTS We present here a new method for multiple sequence alignment that combines an HMM approach, a progressive alignment algorithm, and a probabilistic evolution model describing the character substitution process. Our method works by iterating pairwise alignments according to a guide tree and defining each ancestral sequence from the pairwise alignment of its child nodes, thus, progressively constructing a multiple alignment. Our method allows for the computation of each column minimum posterior probability and we show that this value correlates with the correctness of the result, hence, providing an efficient mean by which unreliably aligned columns can be filtered out from a multiple alignment.
Bioinformatics | 2001
Ari Löytynoja; Michel C. Milinkovitch
SUMMARY SOAP is a stand-alone, multi-platform program to test the stability of a multiple alignment of molecular sequences.
Methods of Molecular Biology | 2014
Ari Löytynoja
Evolutionary analyses require sequence alignments that correctly represent evolutionary homology. Evolutionary and structural homology are not the same and sequence alignments generated with methods designed for structural matching can be seriously misleading in comparative and phylogenetic analyses. The phylogeny-aware alignment algorithm implemented in the program PRANK has been shown to produce good alignments for evolutionary inferences. Unlike other alignment programs, PRANK makes use of phylogenetic information to distinguish alignment gaps caused by insertions or deletions and, thereafter, handles the two types of events differently. As a by-product of the correct handling of insertions and deletions, PRANK can provide the inferred ancestral sequences as a part of the output and mark the alignment gaps differently depending on their origin in insertion or deletion events. As the algorithm infers the evolutionary history of the sequences, PRANK can be sensitive to errors in the guide phylogeny and violations on the underlying assumptions about the origin and patterns of gaps. These issues are discussed in detail and practical advice for the use of PRANK in evolutionary analysis is provided. The PRANK software and other methods discussed here can be found from the program home page at http://code.google.com/p/prank-msa/.
Bioinformatics | 2012
Ari Löytynoja; Albert J. Vilella; Nick Goldman
Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Environmental Microbiology | 2012
Haiwei Luo; Ari Löytynoja; Mary Ann Moran
Understanding of the ecological roles and evolutionary histories of marine bacterial taxa can be complicated by mismatches in genome content between wild populations and their better-studied cultured relatives. We used computed patterns of non-synonymous (amino acid-altering) nucleotide diversity in marine metagenomic data to provide high-confidence identification of DNA fragments from uncultivated members of the Roseobacter clade, an abundant taxon of heterotrophic marine bacterioplankton in the worlds oceans. Differences in gene stoichiometry in the Global Ocean Survey metagenomic data set compared with 39 sequenced isolates indicated that natural Roseobacter populations differ systematically in several genomic attributes from their cultured representatives, including fewer genes for signal transduction and cell surface modifications but more genes for Sec-like protein secretion systems, anaplerotic CO(2) incorporation, and phosphorus and sulfate uptake. Several of these trends match well with characteristics previously identified as distinguishing r- versus K-selected ecological strategies in bacteria, suggesting that the r-strategist model assigned to cultured roseobacters may be less applicable to their free-living oceanic counterparts. The metagenomic Roseobacter DNA fragments revealed several traits with evolutionary histories suggestive of horizontal gene transfer from other marine bacterioplankton taxa or viruses, including pyrophosphatases and glycosylation proteins.
Proceedings of the National Academy of Sciences of the United States of America | 2001
Ari Löytynoja; Michel C. Milinkovitch
We investigated the basal phylogeny of eukaryotes through analyses of sequences from the ADP–ATP mitochondrial carrier, a transmembrane protein that is stable in function across eukaryote kingdoms. The ADP–ATP data strongly suggest the grouping of Plantae and Fungi to the exclusion of Metazoa. We implemented several procedures to avoid pervasive analytical artifacts such as erroneous alignment, random rooting, long branch attraction, and misidentification of noisy characters. The quest of an eukaryote tree that would be largely consistent across multiple loci might be essentially illusory because of differential lineage sorting, horizontal gene transfer, and the chimeric nature of early eukaryotes. Better understanding of these evolutionary parameters, requiring separate phylogenetic analyses of multiple independent loci, is fundamental for resolution of the modes of emergence and evolution of the major eukaryote lineages.
Philosophical Transactions of the Royal Society B | 2008
Ari Löytynoja; Nick Goldman
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.
BMC Bioinformatics | 2007
Matti Kankainen; Ari Löytynoja
BackgroundSequence motifs representing transcription factor binding sites (TFBS) are commonly encoded as position frequency matrices (PFM) or degenerate consensus sequences (CS). These formats are used to represent the characterised TFBS profiles stored in transcription factor databases, as well as to represent the potential motifs predicted using computational methods. To fill the gap between the known and predicted motifs, methods are needed for the post-processing of prediction results, i.e. for matching, comparison and clustering of pre-selected motifs. The computational identification of over-represented motifs in sets of DNA sequences is, in particular, a task where post-processing can dramatically simplify the analysis. Efficient post-processing, for example, reduces the redundancy of the motifs predicted and enables them to be annotated.ResultsIn order to facilitate the post-processing of motifs, in both PFM and CS formats, we have developed a tool called Matlign. The tool aligns and evaluates the similarity of motifs using a combination of scoring functions, and visualises the results using hierarchical clustering. By limiting the number of distinct gaps created (though, not their length), the alignment algorithm also correctly aligns motifs with an internal spacer. The method selects the best non-redundant motif set, with repetitive motifs merged together, by cutting the hierarchical tree using silhouette values. Our analyses show that Matlign can reliably discover the most similar analogue from a collection of characterised regulatory elements such that the method is also useful for the annotation of motif predictions by PFM library searches.ConclusionMatlign is a user-friendly tool for post-processing large collections of DNA sequence motifs. Starting from a large number of potential regulatory motifs, Matlign provides a researcher with a non-redundant set of motifs, which can then be further associated to known regulatory elements. A web-server is available at http://ekhidna.biocenter.helsinki.fi/poxo/matlign.