Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dov Greenbaum is active.

Publication


Featured researches published by Dov Greenbaum.


Trends in Genetics | 2002

Bridging structural biology and genomics: assessing protein interaction data with known complexes

A. Edwards; Bart Kus; Ronald Jansen; Dov Greenbaum; Jack Greenblatt; Mark Gerstein

Currently, there is a major effort to map protein-protein interactions on a genome-wide scale. The utility of the resulting interaction networks will depend on the reliability of the experimental methods and the coverage of the approaches. Known macromolecular complexes provide a defined and objective set of protein interactions with which to compare biochemical and genetic data for validation. Here, we show that a significant fraction of the protein-protein interactions in genome-wide datasets, as well as many of the individual interactions reported in the literature, are inconsistent with the known 3D structures of three recent complexes (RNA polymerase II, Arp2/3 and the proteasome). Furthermore, comparison among genome-wide datasets, and between them and a larger (but less well resolved) group of 174 complexes, also shows marked inconsistencies. Finally, individual interaction datasets, being inherently noisy, are best used when integrated together, and we show how simple Bayesian approaches can combine them, significantly decreasing error rate.


PLOS Computational Biology | 2011

Genomics and Privacy: Implications of the New Reality of Closed Data for the Field

Dov Greenbaum; Andrea Sboner; Xinmeng Jasmine Mu; Mark Gerstein

Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can “slice” and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches—for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums.


Proteins | 2002

Structural genomics analysis: Characteristics of atypical, common, and horizontally transferred folds

Hedi Hegyi; Jimmy Lin; Dov Greenbaum; Mark Gerstein

We conducted a structural genomics analysis of the folds and structural superfamilies in the first 20 completely sequenced genomes by focusing on the patterns of fold usage and trying to identify structural characteristics of typical and atypical folds. We assigned folds to sequences using PSI‐blast, run with a systematic protocol to reduce the amount of computational overhead. On average, folds could be assigned to about a fourth of the ORFs in the genomes and about a fifth of the amino acids in the proteomes. More than 80% of all the folds in the SCOP structural classification were identified in one of the 20 organisms, with worm and E. coli having the largest number of distinct folds. Folds are particularly effective at comprehensively measuring levels of gene duplication, because they group together even very remote homologues. Using folds, we find the average level of duplication varies depending on the complexity of the organism, ranging from 2.4 in M. genitalium to 32 for the worm, values significantly higher than those observed based purely on sequence similarity. We rank the common folds in the 20 organisms, finding that the top three are the P‐loop NTP hydrolase, the ferrodoxin fold, and the TIM‐barrel, and discuss in detail the many factors that affect and bias these rankings. We also identify atypical folds that are “unique” to one of the organisms in our study and compare the characteristics of these folds with the most common ones. We find that common folds tend be more multifunctional and associated with more regular, “symmetrical” structures than the unique ones. In addition, many of the unique folds are associated with proteins involved in cell defense (e.g., toxins). We analyze specific patterns of fold occurrence in the genomes by associating some of them with instances of horizontal transfer and others with gene loss. In particular, we find three possible examples of transfer between archaea and bacteria and six between eukarya and bacteria. We make available our detailed results at http://genecensus.org/20. Proteins 2002;47:126–141.


American Journal of Bioethics | 2008

Genomic Anonymity: Have We Already Lost It?

Dov Greenbaum; Jiang Du; Mark Gerstein

Hull and colleagues (2008) discuss the utility of the current regulatory distinction between identifiable and nonidentifiable genomic information, particularly given the seemingly anomalous preferences of their surveyed patient population. As the authors note, this regulatory distinction will become even less meaningful with the proliferation of genomic databases. Particularly as industries such as personal genomics expand — flooding both private and public databases with readily identifiable genomic data — they will effectively prevent an ever-growing number of individuals from remaining genetically anonymous (Lowrance and Collins 2007). In fact, recent research has already shown that individual genomes can be readily identified out of larger mixed groups of publicly available data from genome wide association studies using only a small subset of one’s genome (Homer et al. 2008). Once it’s known that a person has participated in a genome wide association study, it becomes fairly straightforward to use their or their relative’s genomic data — which may well be made available through personal genomics — to re-identify that individual (National Institutes of Health [NIH] 2008). The general expanse of genomics into our medical system, both through personal genomics and also through other evolving biomedical technologies such as targeted personalized medicine, also raises other non-trivial privacy concerns both for the patient herself but also for her extended family that share much of her genomic complement.


Journal of Biomedical Informatics | 2007

An interdepartmental Ph.D. program in computational biology and bioinformatics: The Yale perspective

Mark Gerstein; Dov Greenbaum; Kei-Hoi Cheung; Perry L. Miller

Computational biology and bioinformatics (CBB), the terms often used interchangeably, represent a rapidly evolving biological discipline. With the clear potential for discovery and innovation, and the need to deal with the deluge of biological data, many academic institutions are committing significant resources to develop CBB research and training programs. Yale formally established an interdepartmental Ph.D. program in CBB in May 2003. This paper describes Yales program, discussing the scope of the field, the programs goals and curriculum, as well as a number of issues that arose in implementing the program. (Further updated information is available from the programs website, www.cbb.yale.edu.)


American Journal of Bioethics | 2011

The Role of Cloud Computing in Managing the Deluge of Potentially Private Genetic Data

Dov Greenbaum; Mark Gerstein

Schonfeld and colleagues (2011) note that patient privacy concerns are much broader than what is protected under the current regulatory framework. Plummeting costs in DNA sequencing will allow us to collect and analyze cohorts of whole genomes for genome-wide association studies, among other analyses. These genetic data have the potential to be more personal and more informative than current medical records. The likelihood that these effectively unanonymizable data sets will be accessed and analyzed by scientific groups around the planet necessitates either a paradigm shift in the way the science is done, and/or revised understandings of privacy and informed consent. We propose both. Ironically, plummeting sequencing costs have become the bane of genomics researchers. That, and the concomitant rocketing of computational power and falling overhead for data storage, have given scientists the ability to create large genetic data sets over the course of their research. Most laboratories have become, or will soon become, oversubscribed and underpowered vis-à-vis these data sets. Lacking the software and the computational wherewithal to fully appreciate the scientific power of the data, many researchers end up leaving much of the minable information untouched or underutilized. This data deluge, like the illustration mentioned by Schonfeld and colleagues, also raises numerous instances where current regulations fail to protect otherwise private and protected personal information. An unprecedented hole in the protection of private data arises out of the creation of large-scale genomics studies. These data will in all likelihood continue to provide a greater depth of personal information as our ability to mine the sets grows. Often subjects submitting genomic samples are unaware of


Genome Biology | 2017

Structuring supplemental materials in support of reproducibility

Dov Greenbaum; Joel Rozowsky; Victoria C. Stodden; Mark Gerstein

Supplements are increasingly important to the scientific record, particularly in genomics. However, they are often underutilized. Optimally, supplements should make results findable, accessible, interoperable, and reusable (i.e., “FAIR”). Moreover, properly off-loading to them the data and detail in a paper could make the main text more readable. We propose a hierarchical organization for supplements, with some parts paralleling and “shadowing” the main text and other elements branching off from it, and we suggest a specific formatting to make this structure explicit. Furthermore, sections of the supplement could be presented in multiple scientific “dialects”, including machine-readable and lay-friendly formats.


Nature Biotechnology | 2003

A universal legal framework as a prerequisite for database interoperability

Dov Greenbaum; Mark Gerstein

Databases are fundamental to modern scientific research, both as archives and, via manipulation of their contents, as research tools in their own right. One obvious example is the annotation of genomes, requiring systematic downloading, reformatting, standardizing and combining of data in a unified computational framework. This process requires both repeated access to databases, and the ability to show the transformed data, repackaged in a new format, alongside the evidence—the original data sets. It is obvious that interoperation of databases through universal scientific formats and standards facilitates research; data are ineffectual if scattered among incompatible resources. Not as obvious is the need for robust legal frameworks to ensure interoperation. The ambiguity of the present copyright laws governing the protection of databases creates a situation where researchers are unclear about their rights to extract and combine data; and database owners, unsure of how laws safeguard their information, overprotect their data with licenses and technological mechanisms that impede interoperation. Much of the current international database debate can be described as responsive volleys of legislation across the Atlantic, each side trying to establish an industrywide level of protection. Thus, responding to judicial and European developments in database protection, the US Congress (Washington, DC, USA) is currently attempting to augment weak copyright protections. In doing so, US lawmakers need to consider the repercussions of their legislation on scientific research. There is no doubt that database protection is necessary. However, science advances through building upon previous research. Thus, scientific researchers (both academic and commercial) who depend on access to these databases require legislation that is narrow in scope and broad in academic exemptions, and that encourages data shar-


Pharmacogenomics Journal | 2001

Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond

Rajdeep Das; Jochen Junker; Dov Greenbaum; Mark Gerstein

The sequencing of complete genomes provides us with a global view of all the proteins in an organism. Proteomic analysis can be done on a purely sequence-based level, with a focus on finding homologues and grouping them into families and clusters of orthologs. However, incorporating protein structure into this analysis provides valuable simplification; it allows one to collect together very distantly related sequences, thus condensing the proteome into a minimal number of ‘parts.’ We describe issues related to surveying proteomes in terms of structural parts, including methods for fold assignment and formats for comparisons (eg top-10 lists and whole-genome trees), and show how biases in the databases and in sampling can affect these surveys. We illustrate our main points through a case study on the unique protein properties evident in many thermophile genomes (eg more salt bridges). Finally, we discuss metabolic pathways as an even greater simplification of genomes. In comparison to folds these allow the organization of many more genes into coherent systems, yet can nevertheless be understood in many of the same terms.


American Journal of Bioethics | 2014

If You Don’t Know Where You Are Going, You Might Wind Up Someplace Else: Incidental Findings in Recreational Personal Genomics

Dov Greenbaum

Bunnik, E. M., A. C. J. W. Janssens, and M. H. N. Schermer. 2013. A tiered-layered-staged model for informed consent in personal genome testing. European Journal of Human Genetics 21(6): 596– 601. Clayton, E. W., L. B. McCullough, L.G. Biessecker, S. Joffe, L. F. Ross, and S. M. Wolf. 2014. Addressing the ethical challenges in genetic testing and sequencing of children. American Journal of Bioethics 14(3): 3–9.

Collaboration


Dive into the Dov Greenbaum's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jiang Qian

Johns Hopkins University School of Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge