Avanti Shrikumar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Avanti Shrikumar is active.

Explore More

Publication

Featured researches published by Avanti Shrikumar.

Journal of the Royal Society Interface | 2018

Opportunities and obstacles for deep learning in biology and medicine

Travers Ching; Daniel Himmelstein; Brett K. Beaulieu-Jones; Alexandr A. Kalinin; Brian T. Do; Gregory P. Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M. Hoffman; Wei Xie; Gail Rosen; Benjamin J. Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E. Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M. Cofer; Christopher A. Lavender; Srinivas C. Turaga; Amr Alexandari; Zhiyong Lu; David J. Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura Wiley

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural networks prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

bioRxiv | 2017

Reverse-complement parameter sharing improves deep learning models for genomics

Avanti Shrikumar; Peyton Greenside; Anshul Kundaje

Deep learning approaches that have produced breakthrough predictive models in computer vision, speech recognition and machine translation are now being successfully applied to problems in regulatory genomics. However, deep learning architectures used thus far in genomics are often directly ported from computer vision and natural language processing applications with few, if any, domain-specific modifications. In double-stranded DNA, the same pattern may appear identically on one strand and its reverse complement due to complementary base pairing. Here, we show that conventional deep learning models that do not explicitly model this property can produce substantially different predictions on forward and reverse-complement versions of the same DNA sequence. We present four new convolutional neural network layers that leverage the reverse-complement property of genomic DNA sequence by sharing parameters between forward and reverse-complement representations in the model. These layers guarantee that forward and reverse-complement sequences produce identical predictions within numerical precision. Using experiments on simulated and in vivo transcription factor binding data, we show that our proposed architectures lead to improved performance, faster learning and cleaner internal representations compared to conventional architectures trained on the same data. Availability Our implementation is available at https://github.com/kundajelab/keras/tree/keras_1 Contact [email protected], [email protected], [email protected]

bioRxiv | 2017

Separable Fully Connected Layers Improve Deep Learning Models For Genomics

Amr Alexandari; Avanti Shrikumar; Anshul Kundaje

Convolutional neural networks are rapidly gaining popularity in regulatory genomics. Typically, these networks have a stack of convolutional and pooling layers, followed by one or more fully connected layers. In genomics, the same positional patterns are often present across multiple convolutional channels. Therefore, in current state-of-the-art networks, there exists significant redundancy in the representations learned by standard fully connected layers. We present a new separable fully connected layer that learns a weights tensor that is the outer product of positional weights and cross-channel weights, thereby allowing the same positional patterns to be applied across multiple convolutional channels. Decomposing positional and cross-channel weights further enables us to readily impose biologically-inspired constraints on positional weights, such as symmetry. We also propose a novel regularizer that penalizes curvature in the positional weights. Using experiments on simulated and in vivo datasets, we show that networks that incorporate our separable fully connected layer outperform conventional models with analogous architectures and the same number of parameters. Additionally, our networks are more robust to hyperparameter tuning, have more informative gradients, and produce importance scores that are more consistent with known biology than conventional deep neural networks. Availability Our implementation is available at: https://github.com/kundajelab/keras/tree/keras_1 A gist illustrating model setup is at: goo.gl/gYooaa

bioRxiv | 2018

Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays

Rajiv Movva; Peyton Greenside; Avanti Shrikumar; Anshul Kundaje

The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present SNPpet, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained SNPpet on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. SNPpet’s predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of SNPpet to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.

bioRxiv | 2018

Kipoi: accelerating the community exchange and reuse of predictive models for genomics

Ziga Avsec; Roman Kreuzhuber; Johnny Israeli; Nancy Xu; Jun Cheng; Avanti Shrikumar; Abhimanyu Banerjee; Daniel S. Kim; Lara Urban; Anshul Kundaje; Oliver Stegle; Julien Gagneur

Advanced machine learning models applied to large-scale genomics datasets hold the promise to be major drivers for genome science. Once trained, such models can serve as a tool to probe the relationships between data modalities, including the effect of genetic variants on phenotype. However, lack of standardization and limited accessibility of trained models have hampered their impact in practice. To address this, we present Kipoi, a collaborative initiative to define standards and to foster reuse of trained models in genomics. Already, the Kipoi repository contains over 2,000 trained models that cover canonical prediction tasks in transcriptional and post-transcriptional gene regulation. The Kipoi model standard grants automated software installation and provides unified interfaces to apply and interpret models. We illustrate Kipoi through canonical use cases, including model benchmarking, transfer learning, variant effect prediction, and building new models from existing ones. By providing a unified framework to archive, share, access, use, and build on models developed by the community, Kipoi will foster the dissemination and use of machine learning models in genomics.

international conference on machine learning | 2017