Harm van Bakel
Icahn School of Medicine at Mount Sinai
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harm van Bakel.
American Journal of Human Genetics | 2006
Lude Franke; Harm van Bakel; Like Fokkens; Edwin D. de Jong; Michael Egmont-Petersen; Cisca Wijmenga
Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray co-expressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown.
Molecular Cell | 2008
Gwenael Badis; Esther T. Chan; Harm van Bakel; Lourdes Peña-Castillo; Desiree Tillo; Kyle Tsui; Clayton D. Carlson; Andrea J. Gossett; Michael J. Hasinoff; Christopher L. Warren; Marinella Gebbia; Shaheynoor Talukder; Ally Yang; Sanie Mnaimneh; Dimitri Terterov; David Coburn; Ai Li Yeo; Zhen Xuan Yeo; Neil D. Clarke; Jason D. Lieb; Aseem Z. Ansari; Corey Nislow; Timothy R. Hughes
The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially approximately 100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.
PLOS Biology | 2010
Harm van Bakel; Corey Nislow; Benjamin J. Blencowe; Timothy R. Hughes
Short-read RNA sequencing in mouse and human tissues shows that most transcripts are encoded within or nearby known genes and that most of the genome is not transcribed.
EMBO Reports | 2003
Jeroen van de Peppel; Patrick Kemmeren; Harm van Bakel; Marijana Radonjic; Dik van Leenen; Frank C. P. Holstege
Expression profiling is a universal tool, with a range of applications that benefit from the accurate determination of differential gene expression. To allow normalization using endogenous transcript levels, current microarray analyses assume that relatively few transcripts vary, or that any changes that occur are balanced. When normalization using endogenous genes is carried out, changes in expression levels are calculated relative to the behaviour of most of the transcripts. This does not reflect absolute changes if global shifts in messenger RNA populations occur. Using external RNA controls, we have set up microarray experiments to monitor global changes. The levels of most mRNAs were found to change during yeast stationary phase and human heat shock when external controls were included. Even small global changes had a significant effect on the number of genes reported as being differentially expressed. This suggests that global mRNA changes occur more frequently than is assumed at present, and shows that monitoring such effects may be important for the accurate determination of changes in gene expression.
Proceedings of the National Academy of Sciences of the United States of America | 2010
Blair R. G. Gordon; Yifei Li; Linru Wang; Anna Sintsova; Harm van Bakel; Songhai Tian; William Wiley Navarre; Bin Xia; Jun Liu
Bacterial nucleoid-associated proteins play important roles in chromosome organization and global gene regulation. We find that Lsr2 of Mycobacterium tuberculosis is a unique nucleoid-associated protein that binds AT-rich regions of the genome, including genomic islands acquired by horizontal gene transfer and regions encoding major virulence factors, such as the ESX secretion systems, the lipid virulence factors PDIM and PGL, and the PE/PPE families of antigenic proteins. Comparison of genome-wide binding data with expression data indicates that Lsr2 binding results in transcriptional repression. Domain-swapping experiments demonstrate that Lsr2 has an N-terminal dimerization domain and a C-terminal DNA-binding domain. Nuclear magnetic resonance analysis of the DNA-binding domain of Lsr2 and its interaction with DNA reveals a unique structure and a unique mechanism that enables Lsr2 to discriminately target AT-rich sequences through interactions with the minor groove of DNA. Taken together, we provide evidence that mycobacteria have employed a structurally distinct molecule with an apparently different DNA recognition mechanism to achieve a function similar to the Enterobacteriaceae H-NS, likely coordinating global gene regulation and virulence in this group of medically important bacteria.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A. Williams; Harm van Bakel; Eric E. Schadt; Renee Reijo-Pera; Jason G. Underwood; Wing Hung Wong
Significance Isoform identification and discovery are an important goal for transcriptome analysis because the majority of human genes express multiple isoforms with context- and tissue-specific functions. Better annotation of isoforms will also benefit downstream analysis such as expression quantification. Current RNA-Seq methods based on short-read sequencing are not reliable for isoform discovery. In this study we developed a new method based on the combined analysis of short reads and long reads generated, respectively, by second- and third-generation sequencing and applied this method to obtain a comprehensive characterization of the transcriptome of the human embryonic stem cell. The results showed that large gain in sensitivity and specificity can be achieved with this strategy. Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.
Cell | 2014
Patrick Kemmeren; Katrin Sameith; Loes A.L. van de Pasch; Joris J. Benschop; Tineke L. Lenstra; Thanasis Margaritis; Eoghan O’Duibhir; Eva Apweiler; Sake van Wageningen; Cheuk W. Ko; Sebastiaan van Heesch; Mehdi M. Kashani; Giannis Ampatziadis-Michailidis; Mariel O. Brok; Nathalie Brabers; Anthony J. Miles; Diane Bouwmeester; Sander R. van Hooff; Harm van Bakel; Erik Sluiters; Linda V. Bakker; Berend Snel; Philip Lijnzaad; Dik van Leenen; Marian J. A. Groot Koerkamp; Frank C. P. Holstege
To understand regulatory systems, it would be useful to uniformly determine how different components contribute to the expression of all other genes. We therefore monitored mRNA expression genome-wide, for individual deletions of one-quarter of yeast genes, focusing on (putative) regulators. The resulting genetic perturbation signatures reflect many different properties. These include the architecture of protein complexes and pathways, identification of expression changes compatible with viability, and the varying responsiveness to genetic perturbation. The data are assembled into a genetic perturbation network that shows different connectivities for different classes of regulators. Four feed-forward loop (FFL) types are overrepresented, including incoherent type 2 FFLs that likely represent feedback. Systematic transcription factor classification shows a surprisingly high abundance of gene-specific repressors, suggesting that yeast chromatin is not as generally restrictive to transcription as is often assumed. The data set is useful for studying individual genes and for discovering properties of an entire regulatory system.
Nature Neuroscience | 2015
Schahram Akbarian; Chunyu Liu; James A. Knowles; Flora M. Vaccarino; Peggy J. Farnham; Gregory E. Crawford; Andrew E. Jaffe; Dalila Pinto; Stella Dracheva; Daniel H. Geschwind; Jonathan Mill; Angus C. Nairn; Alexej Abyzov; Sirisha Pochareddy; Shyam Prabhakar; Sherman M. Weissman; Patrick F. Sullivan; Matthew W. State; Zhiping Weng; Mette A. Peters; Kevin P. White; Mark Gerstein; Anahita Amiri; Chris Armoskus; Allison E. Ashley-Koch; Taejeong Bae; Andrea Beckel-Mitchener; Benjamin P. Berman; Gerhard A. Coetzee; Gianfilippo Coppola
Recent research on disparate psychiatric disorders has implicated rare variants in genes involved in global gene regulation and chromatin modification, as well as many common variants located primarily in regulatory regions of the genome. Understanding precisely how these variants contribute to disease will require a deeper appreciation for the mechanisms of gene regulation in the developing and adult human brain. The PsychENCODE project aims to produce a public resource of multidimensional genomic data using tissue- and cell type–specific samples from approximately 1,000 phenotypically well-characterized, high-quality healthy and disease-affected human post-mortem brains, as well as functionally characterize disease-associated regulatory elements and variants in model systems. We are beginning with a focus on autism spectrum disorder, bipolar disorder and schizophrenia, and expect that this knowledge will apply to a wide variety of psychiatric disorders. This paper outlines the motivation and design of PsychENCODE.
Nucleic Acids Research | 2011
Mark B. Stead; Sarah Marshburn; Bijoy K. Mohanty; Joydeep Mitra; Lourdes Peňa Castillo; Debashish Ray; Harm van Bakel; Timothy R. Hughes; Sidney R. Kushner
Tiling microarrays have proven to be a valuable tool for gaining insights into the transcriptomes of microbial organisms grown under various nutritional or stress conditions. Here, we describe the use of such an array, constructed at the level of 20 nt resolution for the Escherichia coli MG1655 genome, to observe genome-wide changes in the steady-state RNA levels in mutants defective in either RNase E or RNase III. The array data were validated by comparison to previously published results for a variety of specific transcripts as well as independent northern analysis of additional mRNAs and sRNAs. In the absence of RNase E, 60% of the annotated coding sequences showed either increases or decreases in their steady-state levels. In contrast, only 12% of the coding sequences were affected in the absence of RNase III. Unexpectedly, many coding sequences showed decreased abundance in the RNase E mutant, while more than half of the annotated sRNAs showed changes in abundance. Furthermore, the steady-state levels of many transcripts showed overlapping effects of both ribonucleases. Data are also presented demonstrating how the arrays were used to identify potential new genes, RNase III cleavage sites and the direct or indirect control of specific biological pathways.
Nucleic Acids Research | 2011
Kathy N. Lam; Harm van Bakel; Anton van der Ven; Timothy R. Hughes
C2H2 zinc fingers (C2H2-ZFs) are the most prevalent type of vertebrate DNA-binding domain, and typically appear in tandem arrays (ZFAs), with sequential C2H2-ZFs each contacting three (or more) sequential bases. C2H2-ZFs can be assembled in a modular fashion, providing one explanation for their remarkable evolutionary success. Given a set of modules with defined three-base specificities, modular assembly also presents a way to construct artificial proteins with specific DNA-binding preferences. However, a recent survey of a large number of three-finger ZFAs engineered by modular assembly reported high failure rates (∼70%), casting doubt on the generality of modular assembly. Here, we used protein-binding microarrays to analyze 28 ZFAs that failed in the aforementioned study. Most (17) preferred specific sequences, which in all but one case resembled the intended target sequence. Like natural ZFAs, the engineered ZFAs typically yielded degenerate motifs, binding dozens to hundreds of related individual sequences. Thus, the failure of these proteins in previous assays is not due to lack of sequence-specific DNA-binding activity. Our findings underscore the relevance of individual C2H2-ZF sequence specificities within tandem arrays, and support the general ability of modular assembly to produce ZFAs with sequence-specific DNA-binding activity.