Is this you? Create Your Porfile

Athma A. Pai

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Athma A. Pai is active.

Explore More

Publication

Featured researches published by Athma A. Pai.

Nature | 2010

Understanding mechanisms underlying human gene expression variation with RNA sequencing

Joseph K. Pickrell; John C. Marioni; Athma A. Pai; Jacob F. Degner; Barbara E. Engelhardt; Everlyne Nkadori; Jean-Baptiste Veyrieras; Matthew Stephens; Yoav Gilad; Jonathan K. Pritchard

Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.

American Journal of Human Genetics | 2006

NOTCH2 Mutations Cause Alagille Syndrome, a Heterogeneous Disorder of the Notch Signaling Pathway

Ryan McDaniell; Daniel M. Warthen; Pedro A. Sanchez-Lara; Athma A. Pai; Ian D. Krantz; David A. Piccoli; Nancy B. Spinner

Alagille syndrome (AGS) is caused by mutations in the gene for the Notch signaling pathway ligand Jagged1 (JAG1), which are found in 94% of patients. To identify the cause of disease in patients without JAG1 mutations, we screened 11 JAG1 mutation-negative probands with AGS for alterations in the gene for the Notch2 receptor (NOTCH2). We found NOTCH2 mutations segregating in two families and identified five affected individuals. Renal manifestations, a minor feature in AGS, were present in all the affected individuals. This demonstrates that AGS is a heterogeneous disorder and implicates NOTCH2 mutations in human disease.

Nature | 2012

DNase I sensitivity QTLs are a major determinant of human expression variation

Jacob F. Degner; Athma A. Pai; Roger Pique-Regi; Jean Baptiste Veyrieras; Daniel J. Gaffney; Joseph K. Pickrell; Sherryl De Leon; Katelyn Michelini; Noah Lewellen; Gregory E. Crawford; Matthew Stephens; Yoav Gilad; Jonathan K. Pritchard

The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants ‘DNase I sensitivity quantitative trait loci’ (dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.

Genome Research | 2011

Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data

Roger Pique-Regi; Jacob F. Degner; Athma A. Pai; Daniel J. Gaffney; Yoav Gilad; Jonathan K. Pritchard

Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites corresponding to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNase I cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell types or tissues under diverse conditions.

Bioinformatics | 2009

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Jacob F. Degner; John C. Marioni; Athma A. Pai; Joseph K. Pickrell; Everlyne Nkadori; Yoav Gilad; Jonathan K. Pritchard

Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: [email protected]; [email protected]; [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

PLOS Genetics | 2010

Noisy Splicing Drives mRNA Isoform Diversity in Human Cells

Joseph K. Pickrell; Athma A. Pai; Yoav Gilad; Jonathan K. Pritchard

While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.

PLOS Genetics | 2012

Controls of Nucleosome Positioning in the Human Genome

Daniel J. Gaffney; Graham McVicker; Athma A. Pai; Yvonne N. Fondufe-Mittendorf; Noah Lewellen; Katelyn Michelini; Jonathan Widom; Yoav Gilad; Jonathan K. Pritchard

Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase–seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase–seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned—in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.

PLOS Genetics | 2011

A Genome-Wide Study of DNA Methylation Patterns and Gene Expression Levels in Multiple Human and Chimpanzee Tissues

Athma A. Pai; Jordana T. Bell; John C. Marioni; Jonathan K. Pritchard; Yoav Gilad

The modification of DNA by methylation is an important epigenetic mechanism that affects the spatial and temporal regulation of gene expression. Methylation patterns have been described in many contexts within and across a range of species. However, the extent to which changes in methylation might underlie inter-species differences in gene regulation, in particular between humans and other primates, has not yet been studied. To this end, we studied DNA methylation patterns in livers, hearts, and kidneys from multiple humans and chimpanzees, using tissue samples for which genome-wide gene expression data were also available. Using the multi-species gene expression and methylation data for 7,723 genes, we were able to study the role of promoter DNA methylation in the evolution of gene regulation across tissues and species. We found that inter-tissue methylation patterns are often conserved between humans and chimpanzees. However, we also found a large number of gene expression differences between species that might be explained, at least in part, by corresponding differences in methylation levels. In particular, we estimate that, in the tissues we studied, inter-species differences in promoter methylation might underlie as much as 12%–18% of differences in gene expression levels between humans and chimpanzees.

Proceedings of the National Academy of Sciences of the United States of America | 2012

Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection

Luis B. Barreiro; Ludovic Tailleux; Athma A. Pai; Brigitte Gicquel; John C. Marioni; Yoav Gilad

Tuberculosis (TB) is a major public health problem. One-third of the worlds population is estimated to be infected with Mycobacterium tuberculosis (MTB), the etiological agent causing TB, and active disease kills nearly 2 million individuals worldwide every year. Several lines of evidence indicate that interindividual variation in susceptibility to TB has a heritable component, yet we still know little about the underlying genetic architecture. To address this, we performed a genome-wide mapping study of loci that are associated with functional variation in immune response to MTB. Specifically, we characterized transcript and protein expression levels and mapped expression quantitative trait loci (eQTL) in primary dendritic cells (DCs) from 65 individuals, before and after infection with MTB. We found 198 response eQTL, namely loci that were associated with variation in gene expression levels in either untreated or MTB-infected DCs, but not both. These response eQTL are associated with natural regulatory variation that likely affects (directly or indirectly) host interaction with MTB. Indeed, when we integrated our data with results from a genome-wide association study (GWAS) for pulmonary TB, we found that the response eQTL were more likely to be genetically associated with the disease. We thus identified a number of candidate loci, including the MAPK phosphatase DUSP14 in particular, that are promising susceptibility genes to pulmonary TB.

Genome Biology | 2012

Dissecting the regulatory architecture of gene expression QTLs

Daniel J. Gaffney; Jean-Baptiste Veyrieras; Jacob F. Degner; Roger Pique-Regi; Athma A. Pai; Gregory E. Crawford; Matthew Stephens; Yoav Gilad; Jonathan K. Pritchard

BackgroundExpression quantitative trait loci (eQTLs) are likely to play an important role in the genetics of complex traits; however, their functional basis remains poorly understood. Using the HapMap lymphoblastoid cell lines, we combine 1000 Genomes genotypes and an extensive catalogue of human functional elements to investigate the biological mechanisms that eQTLs perturb.ResultsWe use a Bayesian hierarchical model to estimate the enrichment of eQTLs in a wide variety of regulatory annotations. We find that approximately 40% of eQTLs occur in open chromatin, and that they are particularly enriched in transcription factor binding sites, suggesting that many directly impact protein-DNA interactions. Analysis of core promoter regions shows that eQTLs also frequently disrupt some known core promoter motifs but, surprisingly, are not enriched in other well-known motifs such as the TATA box. We also show that information from regulatory annotations alone, when weighted by the hierarchical model, can provide a meaningful ranking of the SNPs that are most likely to drive gene expression variation.ConclusionsOur study demonstrates how regulatory annotation and the association signal derived from eQTL-mapping can be combined into a single framework. We used this approach to further our understanding of the biology that drives human gene expression variation, and of the putatively causal SNPs that underlie it.

Explore More