Philip Cayting
Yale University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philip Cayting.
Nature | 2012
Mark Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G. Landt; Koon Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger P. Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P. Boyle; Philip Cayting; Alexandra Charos; David Chen; Yong Cheng; Declan Clarke; Catharine L. Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Genome Research | 2012
Stephen G. Landt; Georgi K. Marinov; Anshul Kundaje; Pouya Kheradpour; Florencia Pauli; Serafim Batzoglou; Bradley E. Bernstein; Peter J. Bickel; James B. Brown; Philip Cayting; Yiwen Chen; Gilberto DeSalvo; Charles B. Epstein; Katherine I. Fisher-Aylor; Ghia Euskirchen; Mark Gerstein; Jason Gertz; Alexander J. Hartemink; Michael M. Hoffman; Vishwanath R. Iyer; Youngsook L. Jung; Subhradip Karmakar; Manolis Kellis; Peter V. Kharchenko; Qunhua Li; Tao Liu; X. Shirley Liu; Lijia Ma; Aleksandar Milosavljevic; Richard M. Myers
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Genome Biology | 2012
John A. Stamatoyannopoulos; Michael Snyder; Ross C. Hardison; Bing Ren; Thomas R. Gingeras; David M. Gilbert; Mark Groudine; M. A. Bender; Rajinder Kaul; Theresa K. Canfield; Erica Giste; Audra K. Johnson; Mia Zhang; Gayathri Balasundaram; Rachel Byron; Vaughan Roach; Peter J. Sabo; Richard Sandstrom; A Sandra Stehling; Robert E. Thurman; Sherman M. Weissman; Philip Cayting; Manoj Hariharan; Jin Lian; Yong Cheng; Stephen G. Landt; Zhihai Ma; Barbara J. Wold; Job Dekker; Gregory E. Crawford
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Genome Biology | 2009
Jan O. Korbel; Alexej Abyzov; Xinmeng Jasmine Mu; Nicholas Carriero; Philip Cayting; Zhengdong D. Zhang; Michael Snyder; Mark Gerstein
Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMers coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.
Nucleic Acids Research | 2007
John E. Karro; Yangpan Yan; Deyou Zheng; Zhaolei Zhang; Nicholas Carriero; Philip Cayting; Paul Harrrison; Mark Gerstein
The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current ‘consensus’ set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100 000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources.
Nature Biotechnology | 2010
Hugo Y. K. Lam; Xinmeng Jasmine Mu; Adrian M. Stütz; Andrea Tanzer; Philip Cayting; Michael Snyder; Philip M. Kim; Jan O. Korbel; Mark Gerstein
Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.
Nature | 2014
Yong Cheng; Zhihai Ma; Bong-Hyun Kim; Weisheng Wu; Philip Cayting; Alan P. Boyle; Vasavi Sundaram; Xiaoyun Xing; Nergiz Dogan; Jingjing Li; Ghia Euskirchen; Shin Lin; Yiing Lin; Axel Visel; Trupti Kawli; Xinqiong Yang; Dorrelyn Patacsil; Cheryl A. Keller; Belinda Giardine; Anshul Kundaje; Ting Wang; Len A. Pennacchio; Zhiping Weng; Ross C. Hardison; Michael Snyder
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human–mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Nature | 2014
Alan P. Boyle; Carlos L. Araya; Cathleen M. Brdlik; Philip Cayting; Chao Cheng; Yong Cheng; Kathryn E. Gardner; LaDeana W. Hillier; J. Janette; Lixia Jiang; Dionna M. Kasper; Trupti Kawli; Pouya Kheradpour; Anshul Kundaje; Jingyi Jessica Li; Lijia Ma; Wei Niu; E. Jay Rehm; Joel Rozowsky; Matthew Slattery; Rebecca Spokony; Robert Terrell; Dionne Vafeados; Daifeng Wang; Peter Weisdepp; Yi-Chieh Wu; Dan Xie; Koon Kiu Yan; Elise A. Feingold; Peter J. Good
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
Genome Biology | 2009
Suganthi Balasubramanian; Deyou Zheng; Yuen-Jong Liu; Gang Fang; Adam Frankish; Nicholas Carriero; R. Robilotto; Philip Cayting; Mark Gerstein
BackgroundThe availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances.ResultsWe report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome.ConclusionsOur analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.
Nucleic Acids Research | 2009
Hugo Y. K. Lam; Ekta Khurana; Gang Fang; Philip Cayting; Nicholas Carriero; Kei-Hoi Cheung; Mark Gerstein
Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.