Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hayley Dingerdissen is active.

Publication


Featured researches published by Hayley Dingerdissen.


Database | 2015

BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis

Quan Wan; Hayley Dingerdissen; Yu Fan; Naila Gulzar; Yang Pan; Tsung-Jung Wu; Cheng Yan; Haichen Zhang; Raja Mazumder

BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database. All cancer types are mapped to Disease Ontology terms to facilitate a uniform pan-cancer analysis. The BioXpress database is easily searched using HUGO Gene Nomenclature Committee gene symbol, UniProtKB/RefSeq accession or, alternatively, can be queried by cancer type with specified significance filters. This interface along with availability of pre-computed downloadable files containing differentially expressed genes in multiple cancers enables straightforward retrieval and display of a broad set of cancer-related genes. Database URL: http://hive.biochemistry.gwu.edu/tools/bioxpress


PLOS ONE | 2014

HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

Luis V. Santana-Quintero; Hayley Dingerdissen; Jean Thierry-Mieg; Raja Mazumder; Vahan Simonyan

Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/


Nucleic Acids Research | 2014

Human germline and pan-cancer variomes and their distinct functional profiles.

Yang Pan; Konstantinos Karagiannis; Haichen Zhang; Hayley Dingerdissen; Amirhossein Shamsaddini; Quan Wan; Vahan Simonyan; Raja Mazumder

Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations.


Database | 2016

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

Vahan Simonyan; Konstantin Chumakov; Hayley Dingerdissen; William J. Faison; Scott Goldweber; Anton Golikov; Naila Gulzar; Konstantinos Karagiannis; Phuc Vinh Nguyen Lam; Thomas Maudru; Olesja Muravitskaja; Ekaterina Osipova; Yang Pan; Alexey Pschenichnov; Alexandre Rostovtsev; Luis V. Santana-Quintero; Krista Smith; Elaine E. Thompson; Valery Tkachenko; John Torcivia-Rodriguez; Alin Voskanian; Quan Wan; Jing Wang; Tsung-Jung Wu; Carolyn A. Wilson; Raja Mazumder

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu


FEBS Journal | 2013

Proteome‐wide analysis of nonsynonymous single‐nucleotide variations in active sites of human proteins

Hayley Dingerdissen; Mona Motwani; Konstantinos Karagiannis; Vahan Simonyan; Raja Mazumder

An enzymes active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Nonsynonymous single‐nucleotide variations (nsSNVs), which alter the amino acid sequence, are one type of disruption that can alter the active site. When this occurs, it is assumed that enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active‐site‐impacting nsSNVs in the human genome and search for all pathways observed to be associated with this data set to assess the likely consequences. We find that there are 934 unique nsSNVs that occur at the active sites of 559 proteins. Analysis of the nsSNV data shows an over‐representation of arginine and an under‐representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV‐impacted active site residues with the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over‐ or under‐represented in the data set, with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation–substrate/product pairs that can be used in targeted metabolomics experiments to assay the effects of specific variations. In addition, we report the significant prevalence of aspartic acid to histidine variation in eight proteins associated with nine diseases including glycogen storage diseases, lacrimo‐auriculo‐dento‐digital syndrome, Parkinsons disease and several cancers.


Nucleic Acids Research | 2018

BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery

Hayley Dingerdissen; John Torcivia-Rodriguez; Yu Hu; Ting-Chia Chang; Raja Mazumder; Robel Kahsay

Abstract Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively.


Genomics | 2017

HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls

Vahan Simonyan; Konstantin Chumakov; Eric Donaldson; Konstantinos Karagiannis; Phuc Vinh Nguyen Lam; Hayley Dingerdissen; Alin Voskanian

Advances in high-throughput sequencing (HTS) technologies have greatly increased the availability of genomic data and potential discovery of clinically significant genomic variants. However, numerous issues still exist with the analysis of these data, including data complexity, the absence of formally agreed upon best practices, and inconsistent reproducibility. Toward a more robust and reproducible variant-calling paradigm, we propose a series of selective noise filtrations and post-alignment quality control (QC) techniques that may reduce the rate of false variant calls. We have implemented both novel and refined post-alignment QC mechanisms to augment existing pre-alignment QC measures. These techniques can be used independently or in combination to identify and correct issues caused during data generation or early analysis stages. The adoption of these procedures by the broader scientific community is expected to improve the identification of clinically significant variants both in terms of computational efficiency and in the confidence of the results. AVAILABILITY https://hive.biochemistry.gwu.edu/.


Biology Direct | 2014

A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs

Hayley Dingerdissen; Daniel Weaver; Peter D. Karp; Yang Pan; Vahan Simonyan; Raja Mazumder

BackgroundWe have previously suggested a method for proteome wide analysis of variation at functional residues wherein we identified the set of all human genes with nonsynonymous single nucleotide variation (nsSNV) in the active site residue of the corresponding proteins. 34 of these proteins were shown to have a 1:1:1 enzyme:pathway:reaction relationship, making these proteins ideal candidates for laboratory validation through creation and observation of specific yeast active site knock-outs and downstream targeted metabolomics experiments. Here we present the next step in the workflow toward using yeast metabolic modeling to predict human metabolic behavior resulting from nsSNV.ResultsFor the previously identified candidate proteins, we used the reciprocal best BLAST hits method followed by manual alignment and pathway comparison to identify 6 human proteins with yeast orthologs which were suitable for flux balance analysis (FBA). 5 of these proteins are known to be associated with diseases, including ribose 5-phosphate isomerase deficiency, myopathy with lactic acidosis and sideroblastic anaemia, anemia due to disorders of glutathione metabolism, and two porphyrias, and we suspect the sixth enzyme to have disease associations which are not yet classified or understood based on the work described herein.ConclusionsPreliminary findings using the Yeast 7.0 FBA model show lack of growth for only one enzyme, but augmentation of the Yeast 7.0 biomass function to better simulate knockout of certain genes suggested physiological relevance of variations in three additional proteins. Thus, we suggest the following four proteins for laboratory validation: delta-aminolevulinic acid dehydratase, ferrochelatase, ribose-5 phosphate isomerase and mitochondrial tyrosyl-tRNA synthetase. This study indicates that the predictive ability of this method will improve as more advanced, comprehensive models are developed. Moreover, these findings will be useful in the development of simple downstream biochemical or mass-spectrometric assays to corroborate these predictions and detect presence of certain known nsSNVs with deleterious outcomes. Results may also be useful in predicting as yet unknown outcomes of active site nsSNVs for enzymes that are not yet well classified or annotated.ReviewersThis article was reviewed by Daniel Haft and Igor B. Rogozin.


Scientific Reports | 2018

Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer

Yu Fan; Yu Hu; Cheng Yan; Radoslav Goldman; Yang Pan; Raja Mazumder; Hayley Dingerdissen

Despite availability of sequence site-specific information resulting from years of sequencing and sequence feature curation, there have been few efforts to integrate and annotate this information. In this study, we update the number of human N-linked glycosylation sequons (NLGs), and we investigate cancer-relatedness of glycosylation-impacting somatic nonsynonymous single-nucleotide variation (nsSNV) by mapping human NLGs to cancer variation data and reporting the expected loss or gain of glycosylation sequon. We find 75.8% of all human proteins have at least one NLG for a total of 59,341 unique NLGs (includes predicted and experimentally validated). Only 27.4% of all NLGs are experimentally validated sites on 4,412 glycoproteins. With respect to cancer, 8,895 somatic-only nsSNVs abolish NLGs in 5,204 proteins and 12,939 somatic-only nsSNVs create NLGs in 7,356 proteins in cancer samples. nsSNVs causing loss of 24 NLGs on 23 glycoproteins and nsSNVs creating 41 NLGs on 40 glycoproteins are identified in three or more cancers. Of all identified cancer somatic variants causing potential loss or gain of glycosylation, only 36 have previously known disease associations. Although this work is computational, it builds on existing genomics and glycobiology research to promote identification and rank potential cancer nsSNV biomarkers for experimental validation.


Database | 2018

DEXTER: Disease-Expression Relation Extraction from Text.

Samir Gupta; Hayley Dingerdissen; Karen E. Ross; Yu Hu; Cathy H. Wu; Raja Mazumder; K. Vijay-Shanker

Abstract Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression–disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress. Database URL: http://biotm.cis.udel.edu/DEXTER

Collaboration


Dive into the Hayley Dingerdissen's collaboration.

Top Co-Authors

Avatar

Raja Mazumder

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Vahan Simonyan

Center for Biologics Evaluation and Research

View shared research outputs
Top Co-Authors

Avatar

Yang Pan

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Yu Hu

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Cheng Yan

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Konstantinos Karagiannis

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Quan Wan

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

John Torcivia-Rodriguez

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Naila Gulzar

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Phuc Vinh Nguyen Lam

Center for Biologics Evaluation and Research

View shared research outputs
Researchain Logo
Decentralizing Knowledge