Declan Clarke
Yale University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Declan Clarke.
Nature | 2012
Mark Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G. Landt; Koon Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger P. Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P. Boyle; Philip Cayting; Alexandra Charos; David Chen; Yong Cheng; Declan Clarke; Catharine L. Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Science | 2013
Ekta Khurana; Yao Fu; Vincenza Colonna; Xinmeng Jasmine Mu; Hyun Min Kang; Tuuli Lappalainen; Andrea Sboner; Lucas Lochovsky; Jieming Chen; Arif Harmanci; Jishnu Das; Alexej Abyzov; Suganthi Balasubramanian; Kathryn Beal; Dimple Chakravarty; Daniel Challis; Yuan Chen; Declan Clarke; Laura Clarke; Fiona Cunningham; Uday S. Evani; Paul Flicek; Robert Fragoza; Erik Garrison; Richard A. Gibbs; Zeynep H. Gümüş; Javier Herrero; Naoki Kitabayashi; Yong Kong; Kasper Lage
Introduction Plummeting sequencing costs have led to a great increase in the number of personal genomes. Interpreting the large number of variants in them, particularly in noncoding regions, is a current challenge. This is especially the case for somatic variants in cancer genomes, a large proportion of which are noncoding. Prioritization of candidate noncoding cancer drivers based on patterns of selection. (Step 1) Filter somatic variants to exclude 1000 Genomes polymorphisms; (2) retain variants in noncoding annotations; (3) retain those in “sensitive” regions; (4) prioritize those disrupting a transcription-factor binding motif and (5) residing near the center of a biological network; (6) prioritize ones in annotation blocks mutated in multiple cancer samples. Methods We investigated patterns of selection in DNA elements from the ENCODE project using the full spectrum of variants from 1092 individuals in the 1000 Genomes Project (Phase 1), including single-nucleotide variants (SNVs), short insertions and deletions (indels), and structural variants (SVs). Although we analyzed broad functional annotations, such as all transcription-factor binding sites, we focused more on highly specific categories such as distal binding sites of factor ZNF274. The greater statistical power of the Phase 1 data set compared with earlier ones allowed us to differentiate the selective constraints on these categories. We also used connectivity information between elements from protein-protein-interaction and regulatory networks. We integrated all the information on selection to develop a workflow (FunSeq) to prioritize personal-genome variants on the basis of their deleterious impact. As a proof of principle, we experimentally validated and characterized a few candidate variants. Results We identified a specific subgroup of noncoding categories with almost as much selective constraint as coding genes: “ultrasensitive” regions. We also uncovered a number of clear patterns of selection. Elements more consistently active across tissues and both maternal and paternal alleles (in terms of allele-specific activity) are under stronger selection. Variants disruptive because of mechanistic effects on transcription-factor binding (i.e. “motif-breakers”) are selected against. Higher network connectivity (i.e. for hubs) is associated with higher constraint. Additionally, many hub promoters and regulatory elements show evidence of recent positive selection. Overall, indels and SVs follow the same pattern as SNVs; however, there are notable exceptions. For instance, enhancers are enriched for SVs formed by nonallelic homologous recombination. We integrated these patterns of selection into the FunSeq prioritization workflow and applied it to cancer variants, because they present a strong contrast to inherited polymorphisms. In particular, application to ~90 cancer genomes (breast, prostate and medulloblastoma) reveals nearly a hundred candidate noncoding drivers. Discussion Our approach can be readily used to prioritize variants in cancer and is immediately applicable in a precision-medicine context. It can be further improved by incorporation of larger-scale population sequencing, better annotations, and expression data from large cohorts. Identifying Important Identifiers Each of us has millions of sequence variations in our genomes. Signatures of purifying or negative selection should help identify which of those variations is functionally important. Khurana et al. (1235587) used sequence polymorphisms from 1092 humans across 14 populations to identify patterns of selection, especially in noncoding regulatory regions. Noncoding regions under very strong negative selection included binding sites of some chromatin and general transcription factors (TFs) and core motifs of some important TF families. Positive selection in TF binding sites tended to occur in network hub promoters. Many recurrent somatic cancer variants occurred in noncoding regulatory regions and thus might indicate mutations that drive cancer. Regions under strong selection in the human genome identify noncoding regulatory elements with possible roles in disease. Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Protein Science | 2011
Nitin Bhardwaj; Alexej Abyzov; Declan Clarke; Chong Shou; Mark Gerstein
The integration of molecular networks with other types of data, such as changing levels of gene expression or protein‐structural features, can provide richer information about interactions than the simple node‐and‐edge representations commonly used in the network community. For example, the mapping of 3D‐structural data onto networks enables classification of proteins into singlish‐ or multi‐interface hubs (depending on whether they have >2 interfaces). Similarly, interactions can be classified as permanent or transient, depending on whether their interface is used by only one or by multiple partners. Here, we incorporate an additional dimension into molecular networks: dynamic conformational changes. We parse the entire PDB structural databank for alternate conformations of proteins and map these onto the protein interaction network, to compile a first version of the Dynamic Structural Interaction Network (DynaSIN). We make this network available as a readily downloadable resource file, and we then use it to address a variety of downstream questions. In particular, we show that multi‐interface hubs display a greater degree of conformational change than do singlish‐interface ones; thus, they show more plasticity which perhaps enables them to utilize more interfaces for interactions. We also find that transient associations involve smaller conformational changes than permanent ones. Although this may appear counterintuitive, it is understandable in the following framework: as proteins involved in transient interactions shuttle between interchangeable associations, they interact with domains that are similar to each other and so do not require drastic structural changes for their activity. We provide evidence for this hypothesis through showing that interfaces involved in transient interactions bind fewer classes of domains than those in a control set.
Structure | 2016
Declan Clarke; Anurag Sethi; Shantao Li; Sushant Kumar; Richard W.F. Chang; Jieming Chen; Mark Gerstein
The rapidly growing volume of data being produced by next-generation sequencing initiatives is enabling more in-depth analyses of conservation than previously possible. Deep sequencing is uncovering disease loci and regions under selective constraint, despite the fact that intuitive biophysical reasons for such constraint are sometimes absent. Allostery may often provide the missing explanatory link. We use models of protein conformational change to identify allosteric residues by finding essential surface pockets and information-flow bottlenecks, and we develop a software tool that enables users to perform this analysis on their own proteins of interest. Though fundamentally 3D-structural in nature, our analysis is computationally fast, thereby allowing us to run it across the PDB and to evaluate general properties of predicted allosteric residues. We find that these tend to be conserved over diverse evolutionary time scales. Finally, we highlight examples of allosteric residues that help explain poorly understood disease-associated variants.
Current Opinion in Structural Biology | 2015
Anurag Sethi; Declan Clarke; Jieming Chen; Sushant Kumar; Timur R. Galeev; Lynne Regan; Mark Gerstein
Structure has traditionally been interrelated with sequence, usually in the framework of comparing sequences across species sharing a common fold. However, the nature of information within the sequence and structure databases is evolving, changing the type of comparisons possible. In particular, we now have a vast amount of personal genome sequences from human populations and a greater fraction of new structures contain interacting proteins within large complexes. Consequently, we have to recast our conception of sequence conservation and its relation to structure-for example, focusing more on selection within the human population. Moreover, within structural biology there is less emphasis on the discovery of novel folds and more on relating structures to networks of protein interactions. We cover this changing mindset here.
Nucleic Acids Research | 2016
Sushant Kumar; Declan Clarke; Mark Gerstein
Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events.
Proceedings of the National Academy of Sciences of the United States of America | 2011
Nitin Bhardwaj; Declan Clarke; Mark Gerstein
Since its birth, systems biology has gained a great deal from the protocols devised to study phenomena at the level of single proteins and nucleic acids. Such protocols find broad markets and utility at higher levels of biological organization, from next-generation sequencing, which uses modified nucloeotides and fluorescent identifiers (1), to ChIP-seq analysis, which identifies histone modifications and binding sites in protein–DNA interactions (2). In PNAS, Sivaramakrishnan and Spudich introduce a system for interrogating interactions between pairs of proteins, domains, and peptides (3), and it is very possible that their invention will find applicability in the construction and analysis of large-scale protein–protein interaction networks (4).
Bioinformatics | 2012
Lukas Habegger; Suganthi Balasubramanian; David Chen; Ekta Khurana; Andrea Sboner; Arif Harmanci; Joel Rozowsky; Declan Clarke; Michael Snyder; Mark Gerstein
Journal of Structural Biology | 2012
Declan Clarke; Nitin Bhardwaj; Mark Gerstein
Archive | 1984
Declan Clarke; Anna Ritchief; Graham Ritchie; Margaret F Bruce; G F Hamilton; N W Kerr; Dorothy A Lunt