Casper Kaae Sønderby

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Casper Kaae Sønderby is active.

Explore More

Publication

Featured researches published by Casper Kaae Sønderby.

Nucleic Acids Research | 2016

BloodSpot: a database of gene expression profiles and transcriptional programs for healthy and malignant haematopoiesis

Frederik Otzen Bagger; Damir Sasivarevic; Sina Hadi Sohi; Linea Gøricke Laursen; Sachin Pundhir; Casper Kaae Sønderby; Ole Winther; Nicolas Rapin; Bo T. Porse

Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan–Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.

arXiv: Quantitative Methods | 2015

Convolutional LSTM Networks for Subcellular Localization of Proteins

Søren Kaae Sønderby; Casper Kaae Sønderby; Henrik Nielsen; Ole Winther

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks.

Bioinformatics | 2017

DeepLoc: prediction of protein subcellular localization using deep learning

José Juan Almagro Armenteros; Casper Kaae Sønderby; Søren Kaae Sønderby; Henrik Nielsen; Ole Winther

Motivation The prediction of eukaryotic protein subcellular localization is a well‐studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Results Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane‐bound or soluble), outperforming current state‐of‐the‐art algorithms, including those relying on homology information. Availability and implementation The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. Contact [email protected]

Bioinformatics | 2017

An introduction to deep learning on biological sequence data: examples and solutions

Vanessa Isabell Jurtz; Alexander Rosenberg Johansen; Morten Nielsen; José Juan Almagro Armenteros; Henrik Nielsen; Casper Kaae Sønderby; Ole Winther; Søren Kaae Sønderby

Motivation Deep neural network architectures such as convolutional and long short‐term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short‐term memory neural networks can relatively easily be designed and trained to state‐of‐the‐art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

bioRxiv | 2018

scVAE: Variational auto-encoders for single-cell gene expression data

Christopher Heje Grønbech; Maximillian Fornitz Vording; Pascal Timshel; Casper Kaae Sønderby; Tune H. Pers; Ole Winther

We propose a novel variational auto-encoder-based method for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We show for several scRNA-seq data sets that our method outperforms recently proposed scRNA-seq methods in clustering cells. Our software tool scVAE has support for several count likelihood functions and a variant of the variational auto-encoder has a priori clustering in the latent space.

international conference on bioinformatics | 2017

Deep Recurrent Conditional Random Field Network for Protein Secondary Prediction

Alexander Rosenberg Johansen; Casper Kaae Sønderby; Søren Kaae Sønderby; Ole Winther

Deep learning has become the state-of-the-art method for predicting protein secondary structure from only its amino acid residues and sequence profile. Building upon these results, we propose to combine a bi-directional recurrent neural network (biRNN) with a conditional random field (CRF), which we call the biRNN-CRF. The biRNN-CRF may be seen as an improved alternative to an auto-regressive uni-directional RNN where predictions are performed sequentially conditioning on the prediction in the previous time-step. The CRF is instead nearest neighbor-aware and models for the joint distribution of the labels for all time-steps. We condition the CRF on the output of biRNN, which learns a distributed representation based on the entire sequence. The biRNN-CRF is therefore close to ideally suited for the secondary structure task because a high degree of cross-talk between neighboring elements can be expected. We validate the model on several benchmark datasets. For example, on CB513, a model with 1.7 million parameters, achieves a Q8 accuracy of 69.4 for single model and 70.9 for ensemble, which to our knowledge is state-of-the-art.

international conference on machine learning | 2016