Leena Salmela
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leena Salmela.
Bioinformatics | 2011
Leena Salmela; Jan Schröder
MOTIVATION Current sequencing technologies produce a large number of erroneous reads. The sequencing errors present a major challenge in utilizing the data in de novo sequencing projects as assemblers have difficulties in dealing with errors. RESULTS We present Coral which corrects sequencing errors by forming multiple alignments. Unlike previous tools for error correction, Coral can utilize also bases distant from the error in the correction process because the whole read is present in the alignment. Coral is easily adjustable to reads produced by different sequencing technologies like Illumina Genome Analyzer and Roche/454 Life Sciences sequencing platforms because the sequencing error model can be defined by the user. We show that our method is able to reduce the error rate of reads more than previous methods. AVAILABILITY The source code of Coral is freely available at http://www.cs.helsinki.fi/u/lmsalmel/coral/.
Bioinformatics | 2010
Leena Salmela
MOTIVATION High-throughput sequencing technologies produce large sets of short reads that may contain errors. These sequencing errors make de novo assembly challenging. Error correction aims to reduce the error rate prior assembly. Many de novo sequencing projects use reads from several sequencing technologies to get the benefits of all used technologies and to alleviate their shortcomings. However, combining such a mixed set of reads is problematic as many tools are specific to one sequencing platform. The SOLiD sequencing platform is especially problematic in this regard because of the two base color coding of the reads. Therefore, new tools for working with mixed read sets are needed. RESULTS We present an error correction tool for correcting substitutions, insertions and deletions in a mixed set of reads produced by various sequencing platforms. We first develop a method for correcting reads from any sequencing technology producing base space reads such as the SOLEXA/Illumina and Roche/454 Life Sciences sequencing platforms. We then further refine the algorithm to correct the color space reads from the Applied Biosystems SOLiD sequencing platform together with normal base space reads. Our new tool is based on the SHREC program that is aimed at correcting SOLEXA/Illumina reads. Our experiments show that we can detect errors with 99% sensitivity and >98% specificity if the combined sequencing coverage of the sets is at least 12. We also show that the error rate of the reads is greatly reduced. AVAILABILITY The JAVA source code is freely available at http://www.cs.helsinki.fi/u/lmsalmel/hybrid-shrec/ CONTACT [email protected]
Nature Communications | 2014
Virpi Ahola; Rainer Lehtonen; Panu Somervuo; Leena Salmela; Patrik Koskinen; Pasi Rastas; Niko Välimäki; Lars Paulin; Jouni Kvist; Niklas Wahlberg; Jaakko Tanskanen; Emily A. Hornett; Laura Ferguson; Shiqi Luo; Zijuan Cao; Maaike de Jong; Anne Duplouy; Olli-Pekka Smolander; Heiko Vogel; Rajiv C. McCoy; Kui Qian; Wong Swee Chong; Qin Zhang; Freed Ahmad; Jani K. Haukka; Aruj Joshi; Jarkko Salojärvi; Christopher W. Wheat; Ewald Grosse-Wilde; Daniel C. Hughes
Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393 Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140 My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.
Bioinformatics | 2014
Leena Salmela; Eric Rivals
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics | 2011
Leena Salmela; Veli Mäkinen; Niko Välimäki; Johannes Ylinen; Esko Ukkonen
Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. Results: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes. Availability: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/. Contact: [email protected]
ACM Journal of Experimental Algorithms | 2007
Leena Salmela; Jorma Tarhio; Jari Kytöjoki
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q-grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g., different trie implementations of the Aho--Corasick algorithm. All of our algorithms appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns and the good performance of two of them continues to 100,000 patterns. The gain is because of the improved filtering efficiency caused by q-grams.
symposium on experimental and efficient algorithms | 2010
Branislav Ďurian; Hannu Peltola; Leena Salmela; Jorma Tarhio
We present three bit-parallel algorithms for exact searching of long patterns. Two algorithms are modifications of the BNDM algorithm and the third one is a filtration method which utilizes locations of q-grams in the pattern. Two algorithms apply a condensed representation of q-grams. Practical experiments show that the new algorithms are competitive with earlier algorithms with or without bit-parallelism. The average time complexity of the algorithms is analyzed. Two of the algorithms are shown to be optimal on average.
workshop on algorithms in bioinformatics | 2009
Eric Rivals; Leena Salmela; Petteri Kiiskinen; Petri Kalsi; Jorma Tarhio
With Next Generation Sequencers, sequence based transcriptomic or epigenomic assays yield millions of short sequence reads that need to be mapped back on a reference genome. The upcoming versions of these sequencers promise even higher sequencing capacities; this may turn the read mapping task into a bottleneck for which alternative pattern matching approaches must be experimented. We present an algorithm and its implementation, called mpscan, which uses a sophisticated filtration scheme to match a set of patterns/reads exactly on a sequence. MPSCAN can search for millions of reads in a single pass through the genome without indexing its sequence. Moreover, we show that MPSCAN offers an optimal average time complexity, which is sublinear in the text length, meaning that it does not need to examine all sequence positions. Comparisons with BLAT-like tools and with six specialised read mapping programs (like BOWTIE or ZOOM) demonstrate that mpscan also is the fastest algorithm in practice for exact matching. Our accuracy and scalability comparisons reveal that some tools are inappropriate for read mapping. Moreover, we provide evidence suggesting that exact matching may be a valuable solution in some read mapping applications. As most read mapping programs somehow rely on exact matching procedures to perform approximate pattern mapping, the filtration scheme we experimented may reveal useful in the design of future algorithms. The absence of genome index gives mpscan its low memory requirement and flexibility that let it run on a desktop computer and avoids a time-consuming genome preprocessing.
combinatorial pattern matching | 2003
Jari Kytöjoki; Leena Salmela; Jorma Tarhio
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q- grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g. different trie implementations of the Aho-Corasick algorithm. Our algorithms showed to be substantially faster than earlier solutions for sets of 1,000- 100,000 patterns. The gain is due to the improved filtering efficiency caused by q-grams.
Molecular Ecology | 2015
Jouni Kvist; Anniina L. K. Mattila; Panu Somervuo; Virpi Ahola; Patrik Koskinen; Lars Paulin; Leena Salmela; Toby Fountain; Pasi Rastas; Annukka Ruokolainen; Minna Taipale; Liisa Holm; Petri Auvinen; Rainer Lehtonen; Mikko J. Frilander; Ilkka Hanski
Insect flight is one of the most energetically demanding activities in the animal kingdom, yet for many insects flight is necessary for reproduction and foraging. Moreover, dispersal by flight is essential for the viability of species living in fragmented landscapes. Here, working on the Glanville fritillary butterfly (Melitaea cinxia), we use transcriptome sequencing to investigate gene expression changes caused by 15 min of flight in two contrasting populations and the two sexes. Male butterflies and individuals from a large metapopulation had significantly higher peak flight metabolic rate (FMR) than female butterflies and those from a small inbred population. In the pooled data, FMR was significantly positively correlated with genome‐wide heterozygosity, a surrogate of individual inbreeding. The flight experiment changed the expression level of 1513 genes, including genes related to major energy metabolism pathways, ribosome biogenesis and RNA processing, and stress and immune responses. Males and butterflies from the population with high FMR had higher basal expression of genes related to energy metabolism, whereas females and butterflies from the small population with low FMR had higher expression of genes related to ribosome/RNA processing and immune response. Following the flight treatment, genes related to energy metabolism were generally down‐regulated, while genes related to ribosome/RNA processing and immune response were up‐regulated. These results suggest that common molecular mechanisms respond to flight and can influence differences in flight metabolic capacity between populations and sexes.