Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anil Raj is active.

Publication


Featured researches published by Anil Raj.


Genetics | 2014

fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Anil Raj; Matthew Stephens; Jonathan K. Pritchard

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.


Science | 2013

Identification of genetic variants that affect histone modifications in human cells.

Graham McVicker; Bryce van de Geijn; Jacob F. Degner; Carolyn E. Cain; Nicholas E. Banovich; Anil Raj; Noah Lewellen; Marsha Myrthil; Yoav Gilad; Jonathan K. Pritchard

DNA Differences The extent to which genetic variation affects an individuals phenotype has been difficult to predict because the majority of variation lies outside the coding regions of genes. Now, three studies examine the extent to which genetic variation affects the chromatin of individuals with diverse ancestry and genetic variation (see the Perspective by Furey and Sethupathy). Kasowski et al. (p. 750, published online 17 October) examined how genetic variation affects differences in chromatin states and their correlation to histone modifications, as well as more general DNA binding factors. Kilpinen et al. (p. 744, published online 17 October) document how genetic variation is linked to allelic specificity in transcription factor binding, histone modifications, and transcription. McVicker et al. (p. 747, published online 17 October) identified how quantitative trait loci affect histone modifications in Yoruban individuals and established which specific transcription factors affect such modifications. Human genetic variation affects transcription factor binding, leading to histone modifications. [Also see Perspective by Furey and Sethupathy] Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines (LCLs). In many cases, the same variant is associated with quantitative changes in multiple histone marks and Pol II, as well as in deoxyribonuclease I sensitivity and nucleosome positioning. Transcription factor binding site polymorphisms are correlated overall with differences in local histone modification, and we identify specific transcription factors whose binding leads to histone modification in LCLs. Furthermore, variants that affect chromatin at distal regulatory sites frequently also direct changes in chromatin and gene expression at associated promoters.


Science | 2016

RNA splicing is a primary link between genetic variation and disease.

Yang I. Li; Bryce van de Geijn; Anil Raj; David Knowles; Allegra A. Petti; David E. Golan; Yoav Gilad; Jonathan K. Pritchard

RNA splicing links genetics to disease Many genetic variants associated with disease have no apparent effect on any specific protein coding sequence. Li et al. systematically analyzed the effects of DNA variants on the main steps of gene regulation, from the chromatin state through protein function. One-third of expression quantitative train loci (QTLs) are mediated through transcriptional processes, not chromatin. Splice QTLs and expression QTLs are about comparable in their complex disease risk. Posttranscriptional mechanisms therefore play a large role in translating genotype to phenotype. Science, this issue p. 600 Phenotype is most affected by genetic variants that influence gene expression and transcript splicing. Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.


eLife | 2016

Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

Anil Raj; Sidney H. Wang; Heejung Shim; Arbel Harpak; Yang I. Li; Brett W. Engelmann; Matthew Stephens; Yoav Gilad; Jonathan K. Pritchard

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans. DOI: http://dx.doi.org/10.7554/eLife.13328.001


Nature Methods | 2017

Allele-specific expression reveals interactions between genetic variation and environment

David Knowles; Joe R. Davis; Hilary Edgington; Anil Raj; Marie-Julie Favé; Xiaowei Zhu; James B. Potash; Myrna M. Weissman; Jianxin Shi; Douglas F. Levinson; Stephen B. Montgomery; Alexis Battle

Identifying interactions between genetics and the environment (GxE) remains challenging. We have developed EAGLE, a hierarchical Bayesian model for identifying GxE interactions based on associations between environmental variables and allele-specific expression. Combining whole-blood RNA-seq with extensive environmental annotations collected from 922 human individuals, we identified 35 GxE interactions, compared with only four using standard GxE interaction testing. EAGLE provides new opportunities for researchers to identify GxE interactions using functional genomic data.


Genome Research | 2018

Impact of regulatory variation across human iPSCs and differentiated cells

Nicholas E. Banovich; Yang I. Li; Anil Raj; Michelle C. Ward; Peyton Greenside; Diego Calderon; Po Yuan Tung; Jonathan E. Burnett; Marsha Myrthil; Samantha M. Thomas; Courtney K. Burrows; Irene Gallego Romero; Bryan J Pavlovic; Anshul Kundaje; Jonathan K. Pritchard; Yoav Gilad

Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

An Information-Theoretic Derivation of Min-Cut-Based Clustering

Anil Raj; Chris H. Wiggins

Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast-mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability.


PLOS ONE | 2015

msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

Anil Raj; Heejung Shim; Yoav Gilad; Jonathan K. Pritchard; Matthew Stephens

Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.


PLOS ONE | 2011

Identifying Hosts of Families of Viruses: A Machine Learning Approach

Anil Raj; Michael Dewar; Gustavo Palacios; Raul Rabadan; Chris H. Wiggins

Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome.


Nature Methods | 2014

The genome shows its sensitive side

Anil Raj; Graham McVicker

New methods for measuring the sensitivity of chromatin to DNase digestion and Tn5 transposition help us map and interpret the genomes regulatory sequences.

Collaboration


Dive into the Anil Raj's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge