Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Liisa Holm is active.

Publication


Featured researches published by Liisa Holm.


Nucleic Acids Research | 2000

The Pfam protein families database

Marco Punta; Penny Coggill; Ruth Y. Eberhardt; Jaina Mistry; John G. Tate; Chris Boursnell; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L. L. Sonnhammer; Sean R. Eddy; Alex Bateman; Robert D. Finn

Pfam is a widely used database of protein families, currently containing more than 13u2009000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2u2009years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Science | 1996

Mapping the Protein Universe

Liisa Holm; Chris Sander

The comparison of the three-dimensional shapes of protein molecules poses a complex algorithmic problem. Its solution provides biologists with computational tools to organize the rapidly growing set of thousands of known protein shapes, to identify new types of protein architecture, and to discover unexpected evolutionary relations, reaching back billions of years, between protein molecules. Protein shape comparison also improves tools for identifying gene functions in genome databases by defining the essential sequence-structure features of a protein family. Finally, an exhaustive all-on-all shape comparison provides a map of physical attractor regions in the abstract shape space of proteins, with implications for the processes of protein folding and evolution.


Nucleic Acids Research | 1998

Touring protein fold space with Dali/FSSP

Liisa Holm; Chris Sander

The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is derived using an automatic structure alignment program (Dali) for the all-against-all comparison of structures in the Protein Data Bank. From the resulting enumeration of structural neighbours (which form a surprisingly continuous distribution in fold space) we derive a discrete fold classification in three steps: (i) sequence-related families are covered by a representative set of protein chains; (ii) protein chains are decomposed into structural domains based on the recurrence of structural motifs; (iii) folds are defined as tight clusters of domains in fold space. The fold classification, domain definitions and test sets for sequence-structure alignment (threading) are accessible on the web at www.embl-ebi.ac.uk/dali . The web interface provides a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences leading, for example, to a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The Dali/FSSP organization of protein structures provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination.


Proteins | 1997

AN EVOLUTIONARY TREASURE : UNIFICATION OF A BROAD SET OF AMIDOHYDROLASES RELATED TO UREASE

Liisa Holm; Chris Sander

The recent determination of the three‐dimensional structure of urease revealed striking similarities of enzyme architecture to adenosine deaminase and phosphotriesterase, evidence of a distant evolutionary relationship that had gone undetected by one‐dimensional sequence comparisons. Here, based on an analysis of conservation patterns in three dimensions, we report the discovery of the same active‐site architecture in an even larger set of enzymes involved primarily in nucleotide metabolism. As a consequence, we predict the three‐dimensional fold and details of the active site architecture for dihydroorotases, allantoinases, hydantoinases, AMP‐, adenine and cytosine deaminases, imidazolonepropionase, aryldialkylphosphatase, chlorohydrolases, formylmethanofuran dehydrogenases, and proteins involved in animal neuronal development. Two member families are common to archaea, eubacteria, and eukaryota. Thirteen other functions supported by the same structural motif and conserved chemical mechanism apparently represent later adaptations for different substrate specificities in different cellular contexts.


Nucleic Acids Research | 1997

Dali/FSSP classification of three-dimensional protein folds

Liisa Holm; Chris Sander

The FSSP database presents a continuously updated structural classification of three-dimensional protein folds. It is derived using an automatic structure comparison program (Dali) for the all-against-all comparison of over 6000 three-dimensional coordinate sets in the Protein Data Bank (PDB). Sequence-related protein families are covered by a representative set of 813 protein chains. Hierachical clustering based on structural similarities yields a fold tree that defines 253 fold classes. For each representative protein chain, there is a database entry containing structure-structure alignments with its structural neighbours in the PDB. The database is accessible online through World Wide Web browsers and by anonymous ftp (file transfer protocol). The overview of fold space and the individual data sets provide a rich source of information for the study of both divergent and convergent aspects of molecular evolution, and define useful test sets and a standard of truth for assessing the correctness of sequence-sequence or sequence-structure alignments.


Journal of Molecular Biology | 1991

Database algorithm for generating protein backbone and side-chain co-ordinates from a Cα trace: Application to model building and detection of co-ordinate errors

Liisa Holm; Chris Sander

The problem of constructing all-atom model co-ordinates of a protein from an outline of the polypeptide chain is encountered in protein structure determination by crystallography or nuclear magnetic resonance spectroscopy, in model building by homology and in protein design. Here, we present an automatic procedure for generating full protein co-ordinates (backbone and, optionally, side-chains) given the C alpha trace and amino acid sequence. To construct backbones, a protein structure database is first scanned for fragments that locally fit the chain trace according to distance criteria. A best path algorithm then sifts through these segments and selects an optimal path with minimal mismatch at fragment joints. In blind tests, using fully known protein structures, backbones (C alpha, C, N, O) can be reconstructed with a reliability of 0.4 to 0.6 A root-mean-square position deviation and not more than 0 to 5% peptide flips. This accuracy is sufficient to identify possible errors in protein co-ordinate sets. To construct full co-ordinates, side-chains are added from a library of frequently occurring rotamers using a simple and fast Monte Carlo procedure with simulated annealing. In tests on X-ray structures determined at better than 2.5 A resolution, the positions of side-chain atoms in the protein core (less than 20% relative accessibility) have an accuracy of 1.6 A (r.m.s. deviation) and 70% of chi 1 angles are within 30 degrees of the X-ray structure. The computer program MaxSprout is available on request.


Proteins | 2000

Rapid automatic detection and alignment of repeats in protein sequences

Andreas Heger; Liisa Holm

Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self‐alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam‐A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information. Proteins 2000;41:224–237.


Proteins | 1998

Dictionary of recurrent domains in protein structures.

Liisa Holm; Chris Sander

The rapid growth in the number of experimentally determined three‐dimensional protein structures has sharpened the need for comprehensive and up‐to‐date surveys of known structures. Classic work on protein structure classification has made it clear that a structural survey is best carried out at the level of domains, i.e., substructures that recur in evolution as functional units in different protein contexts. We present a method for automated domain identification from protein structure atomic coordinates based on quantitative measures of compactness and, as the new element, recurrence. Compactness criteria are used to recursively divide a protein into a series of successively smaller and smaller substructures. Recurrence criteria are used to select an optimal size level of these substructures, so that many of the chosen substructures are common to different proteins at a high level of statistical significance. The joint application of these criteria automatically yields consistent domain definitions between remote homologs, a result difficult to achieve using compactness criteria alone. The method is applied to a representative set of 1,137 sequence‐unique protein families covering 6,500 known structures. Clustering of the resulting set of domains (substructures) yields 594 distinct fold classes (types of substructures). The Dali Domain Dictionary (http://www.embl‐ebi.ac.uk/dali) not only provides a global structural classification, but also a comprehensive description of families of protein sequences grouped around representative proteins of known structure. The classification will be continuously updated and can serve as a basis for improving our understanding of protein evolution and function and for evolving optimal strategies to complete the map of all natural protein structures. Proteins 33:88–96, 1998.


Proteins | 1997

PREDICTING PROTEIN STRUCTURE USING HIDDEN MARKOV MODELS

Kevin Karplus; Kimmen Sjolander; Christian Barrett; Melissa S. Cline; David Haussler; Richard Hughey; Liisa Holm

We discuss how methods based on hidden Markov models performed in the fold‐recognition section of the CASP2 experiment. Hidden Markov models were built for a representative set of just over 1,000 structures from the Protein Data Bank (PDB). Each CASP2 target sequence was scored against this library of HMMs. In addition, an HMM was built for each of the target sequences and all of the sequences in PDB were scored against that target model, with a good score on both methods indicating a high probability that the target sequence is homologous to the structure. The method worked well in comparison to other methods used at CASP2 for targets of moderate difficulty, where the closest structure in PDB could be aligned to the target with at least 15% residue identity. Proteins, Suppl. 1:134–139, 1997.


Nature Structural & Molecular Biology | 2001

Identification of homology in protein structure classification

Sabine Dietmann; Liisa Holm

Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.

Collaboration


Dive into the Liisa Holm's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas Heger

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Jong Park

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Sabine Dietmann

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Cyrus Chothia

Laboratory of Molecular Biology

View shared research outputs
Top Co-Authors

Avatar

Gert Vriend

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tuula T. Teeri

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Peer Bork

University of Würzburg

View shared research outputs
Top Co-Authors

Avatar

Alex Bateman

European Bioinformatics Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge