Hans-Henrik Stærfeldt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hans-Henrik Stærfeldt is active.

Explore More

Publication

Featured researches published by Hans-Henrik Stærfeldt.

Nucleic Acids Research | 2007

RNAmmer: consistent and rapid annotation of ribosomal RNA genes

Karin Lagesen; Peter F. Hallin; Einar Andreas Rødland; Hans-Henrik Stærfeldt; Torbjørn Rognes; David W. Ussery

The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.

Journal of Molecular Biology | 2002

Prediction of human protein function from post-translational modifications and localization features

Lars Juhl Jensen; Ramneek Gupta; Nikolaj Blom; D. Devos; J. Tamames; Can Keşmir; Henrik Nielsen; Hans-Henrik Stærfeldt; Kristoffer Rapacki; Christopher T. Workman; Claus A. F. Andersen; Steen Knudsen; Anders Krogh; Alfonso Valencia; Søren Brunak

We have developed an entirely sequence-based method that identifies and integrates relevant features that can be used to assign proteins of unknown function to functional classes, and enzyme categories for enzymes. We show that strategies for the elucidation of protein function may benefit from a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects such as the length, isoelectric point and composition of the polypeptide chain.

Genome Biology | 2007

Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

Jan Gorodkin; Susanna Cirera; Jakob Hedegaard; Michael J. Gilchrist; Frank Panitz; Claus Jørgensen; Karsten Scheibye-Knudsen; Troels Arvin; Steen Lumholdt; Milena Sawera; Trine Green; Bente Nielsen; Jakob Hull Havgaard; Carina Rosenkilde; Jun-Jun Wang; Heng Li; Ruiqiang Li; Bin Liu; Songnian Hu; Wei Dong; Wei Li; Jun Qing Yu; Jian Wang; Hans-Henrik Stærfeldt; Rasmus Wernersson; Lone Madsen; Bo Thomsen; Henrik Hornshøj; Zhan Bujie; Xuegang Wang

BackgroundKnowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.ResultsUsing the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.ConclusionThis EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.

Standards in Genomic Sciences | 2009

GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes

Peter F. Hallin; Hans-Henrik Stærfeldt; Eva Rotenberg; Tim T. Binnewies; Craig J. Benham; David W. Ussery

We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.

bioRxiv | 2017

Text mining of 15 million full-text scientific articles

David Westergaard; Hans-Henrik Stærfeldt; Christian Tønsberg; Lars Juhl Jensen; Søren Brunak

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

international conference computational systems-biology and bioinformatics | 2010

The Genome Atlas Resource

Matloob Qureshi; Eva Rotenberg; Hans-Henrik Stærfeldt; Lena Hansson; David W. Ussery

The Genome Atlas is a resource for addressing the challenges of synchronising prokaryotic genomic sequence data from multiple public repositories. This resource can integrate bioinformatic analyses in various data format and quality. Existing open source tools have been used together with scripts and algorithms developed in a variety of programming languages at the Centre for Biological Sequence Analysis in order to create a three-tier software application for genome analysis. The results are made available via a web interface developed in Java, PHP and Perl CGI. User-configurable and dynamic views of Chromosomal maps are made possible through an updated GeneWiz browser (version 0.94) which uses Java to allow rapid zooming in and out of the atlases.

Journal of Molecular Biology | 2000

A DNA structural atlas for Escherichia coli.

Anders Gorm Pedersen; Lars Juhl Jensen; Søren Brunak; Hans-Henrik Stærfeldt; David W. Ussery

BMC Genomics | 2005

Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

Rasmus Wernersson; Mikkel H. Schierup; Frank Grønlund Jørgensen; Jan Gorodkin; Frank Panitz; Hans-Henrik Stærfeldt; Ole F. Christensen; Thomas Mailund; Henrik Hornshøj; Ami Klein; Jun Wang; Bin Liu; Songnian Hu; Wei Dong; Wei Li; Gane Ka-Shu Wong; Jun Yu; Jian Wang; Christian Bendixen; Merete Fredholm; Søren Brunak; Huanming Yang; Lars Bolund

Environmental Microbiology | 2006