Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lars Arvestad is active.

Publication


Featured researches published by Lars Arvestad.


Nature | 2013

The Norway spruce genome sequence and conifer genome evolution

Björn Nystedt; Nathaniel R. Street; Anna Wetterbom; Andrea Zuccolo; Yao-Cheng Lin; Douglas G. Scofield; Francesco Vezzi; Nicolas Delhomme; Stefania Giacomello; Andrey Alexeyenko; Riccardo Vicedomini; Kristoffer Sahlin; Ellen Sherwood; Malin Elfstrand; Lydia Gramzow; Kristina Holmberg; Jimmie Hällman; Olivier Keech; Lisa Klasson; Maxim Koriabine; Melis Kucukoglu; Max Käller; Johannes Luthman; Fredrik Lysholm; Totte Niittylä; Åke Olson; Nemanja Rilakovic; Carol Ritland; Josep A. Rosselló; Juliana Stival Sena

Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Simultaneous Bayesian gene tree reconstruction and reconciliation analysis

Örjan Åkerborg; Bengt Sennblad; Lars Arvestad; Jens Lagergren

We present GSR, a probabilistic model integrating gene duplication, sequence evolution, and a relaxed molecular clock for substitution rates, that enables genomewide analysis of gene families. The gene duplication and loss process is a major cause for incongruence between gene and species tree, and deterministic methods have been developed to explain such differences through tree reconciliations. Although probabilistic methods for phylogenetic inference have been around for decades, probabilistic reconciliation methods are far less established. Based on our model, we have implemented a Bayesian analysis tool, PrIME-GSR, for gene tree inference that takes a known species tree into account. Our implementation is sound and we demonstrate its utility for genomewide gene-family analysis by applying it to recently presented yeast data. We validate PrIME-GSR by comparing with previous analyses of these data that take advantage of gene order information. In a case study we apply our method to the ADH gene family and are able to draw biologically relevant conclusions concerning gene duplications creating key yeast phenotypes. On a higher level this shows the biological relevance of our method. The obtained results demonstrate the value of a relaxed molecular clock. Our good performance will extend to species where gene order conservation is insufficient.


research in computational molecular biology | 2004

Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution

Lars Arvestad; Ann-Charlotte Berglund; Jens Lagergren; Bengt Sennblad

Gene tree and species tree reconstruction, orthology analysis and reconciliation, are problems important in multigenome-based comparative genomics and biology in general. In the present paper, we advance the frontier of these areas in several respects and provide important computational tools. First, exact algorithms are given for several probabilistic reconciliation problems with respect to the probabilistic gene evolution model, previously developed by the authors. Until now, those problems were solved by MCMC estimation algorithms. Second, we extend the gene evolution model to the gene sequence evolution model, by including sequence evolution. Third, we develop MCMC algorithms for the gene sequence evolution model that, given gene sequence data allows: (1) orthology analysis, reconciliation analysis, and gene tree reconstruction, w.r.t. a species tree, that balances a likely/unlikely reconciliation and a likely/unlikely gene tree and (2) species tree reconstruction that balance a likely/unlikely reconciliation and a likely/unlikely gene trees. These MCMC algorithms take advantage of the exact algorithms for the gene evolution model. We have successfully tested our dynamical programming algorithms on real data for a biogeography problem. The MCMC algorithms perform very well both on synthetic and biological data.


Planta | 2005

The genome sequence of black cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes

Soraya Djerbi; Mats Lindskog; Lars Arvestad; Fredrik Sterky; Tuula T. Teeri

The genome sequence of Populus trichocarpa was screened for genes encoding cellulose synthases by using full-length cDNA sequences and ESTs previously identified in the tissue specific cDNA libraries of other poplars. The data obtained revealed 18 distinct CesA gene sequences in P. trichocarpa. The identified genes were grouped in seven gene pairs, one group of three sequences and one single gene. Evidence from gene expression studies of hybrid aspen suggests that both copies of at least one pair, CesA3-1 and CesA3-2, are actively transcribed. No sequences corresponding to the gene pair, CesA6-1 and CesA6-2, were found in Arabidopsis or hybrid aspen, while one homologous gene has been identified in the rice genome and an active transcript in Populus tremuloides. A phylogenetic analysis suggests that the CesA genes previously associated with secondary cell wall synthesis originate from a single ancestor gene and group in three distinct subgroups. The newly identified copies of CesA genes in P. trichocarpa give rise to a number of new questions concerning the mechanism of cellulose synthesis in trees.


PLOS Computational Biology | 2005

Genome-wide survey for biologically functional pseudogenes

Orjan Per Svensson; Lars Arvestad; Jens Lagergren

According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human–mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i.e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human–mouse species split, and also a larger group of primate-specific ones found from human–chimpanzee searches. Two processed sequences are notable, their conservation since the human–mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7-like 3 (ATX7NL3), and one from the Spinocerebellar ataxia type 1 protein (ATX1). Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross-species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein-coding genes, we use standard methods, utilizing in-frame disablements, as well as a probabilistic filter based on Ka/Ks ratios.


Journal of the ACM | 2009

The gene evolution model and computing its associated probabilities

Lars Arvestad; Jens Lagergren; Bengt Sennblad

Phylogeny is both a fundamental tool in biology and a rich source of fascinating modeling and algorithmic problems. Todays wealth of sequenced genomes makes it increasingly important to understand evolutionary events such as duplications, losses, transpositions, inversions, lateral transfers, and domain shuffling. We focus on the gene duplication event, that constitutes a major force in the creation of genes with new function [Ohno 1970; Lynch and Force 2000] and, thereby also, of biodiversity. We introduce the probabilistic gene evolution model, which describes how a gene tree evolves within a given species tree with respect to speciation, gene duplication, and gene loss. The actual relation between gene tree and species tree is captured by a reconciliation, a concept which we generalize for more expressiveness. The model is a canonical generalization of the classical linear birth-death process, obtained by replacing the interval where the process takes place by a tree. For the gene evolution model, we derive efficient algorithms for some associated probability distributions: the probability of a reconciled tree, the probability of a gene tree, the maximum probability reconciliation, the posterior probability of a reconciliation, and sampling reconciliations with respect to the posterior probability. These algorithms provides the basis for several applications, including species tree construction, reconciliation analysis, orthology analysis, biogeography, and host-parasite co-evolution.


Plant Physiology | 2008

MAP20, a Microtubule-Associated Protein in the Secondary Cell Walls of Hybrid Aspen, Is a Target of the Cellulose Synthesis Inhibitor 2,6-Dichlorobenzonitrile

Alex S. Rajangam; Manoj Kumar; Henrik Aspeborg; Gea Guerriero; Lars Arvestad; Podjamas Pansri; Christian Brown; Sophia Hober; Kristina Blomqvist; Christina Divne; Ines Ezcurra; Ewa J. Mellerowicz; Björn Sundberg; Vincent Bulone; Tuula T. Teeri

We have identified a gene, denoted PttMAP20, which is strongly up-regulated during secondary cell wall synthesis and tightly coregulated with the secondary wall-associated CESA genes in hybrid aspen (Populus tremula × tremuloides). Immunolocalization studies with affinity-purified antibodies specific for PttMAP20 revealed that the protein is found in all cell types in developing xylem and that it is most abundant in cells forming secondary cell walls. This PttMAP20 protein sequence contains a highly conserved TPX2 domain first identified in a microtubule-associated protein (MAP) in Xenopus laevis. Overexpression of PttMAP20 in Arabidopsis (Arabidopsis thaliana) leads to helical twisting of epidermal cells, frequently associated with MAPs. In addition, a PttMAP20-yellow fluorescent protein fusion protein expressed in tobacco (Nicotiana tabacum) leaves localizes to microtubules in leaf epidermal pavement cells. Recombinant PttMAP20 expressed in Escherichia coli also binds specifically to in vitro-assembled, taxol-stabilized bovine microtubules. Finally, the herbicide 2,6-dichlorobenzonitrile, which inhibits cellulose synthesis in plants, was found to bind specifically to PttMAP20. Together with the known function of cortical microtubules in orienting cellulose microfibrils, these observations suggest that PttMAP20 has a role in cellulose biosynthesis.


BMC Bioinformatics | 2014

BESST - Efficient scaffolding of large fragmented assemblies

Kristoffer Sahlin; Francesco Vezzi; Björn Nystedt; Joakim Lundeberg; Lars Arvestad

BackgroundThe use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance.ResultsWe propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide.ConclusionWe conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.


Systematic Biology | 2014

A Bayesian Method for Analyzing Lateral Gene Transfer

Joel Sjöstrand; Ali Tofigh; Vincent Daubin; Lars Arvestad; Bengt Sennblad; Jens Lagergren

Lateral gene transfer (LGT)--which transfers DNA between two non-vertically related individuals belonging to the same or different species--is recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events.


Bioinformatics | 2010

Classification of DNA sequences using Bloom filters

Henrik Stranneheim; Max Käller; Tobias Allander; Björn Andersson; Lars Arvestad; Joakim Lundeberg

Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Collaboration


Dive into the Lars Arvestad's collaboration.

Top Co-Authors

Avatar

Jens Lagergren

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kristoffer Sahlin

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joakim Lundeberg

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Peter Savolainen

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Raja Hashim Ali

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tuula T. Teeri

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Francesco Vezzi

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mehmood Alam Khan

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Sayyed Auwn Muhammad

Royal Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge