Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Konstantinos Karagiannis is active.

Publication


Featured researches published by Konstantinos Karagiannis.


Genomics, Proteomics & Bioinformatics | 2013

Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes

Phuc Vinh Nguyen Lam; Radoslav Goldman; Konstantinos Karagiannis; Tejas Narsule; Vahan Simonyan; Valerii Soika; Raja Mazumder

The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharomyces cerevisiae. Our analysis shows that 78% of all asparagines of NXS/T motif involved in N-glycosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribution across the secondary structural elements, indicating that the NXS/T motif in itself is not biologically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.


Nucleic Acids Research | 2014

Human germline and pan-cancer variomes and their distinct functional profiles.

Yang Pan; Konstantinos Karagiannis; Haichen Zhang; Hayley Dingerdissen; Amirhossein Shamsaddini; Quan Wan; Vahan Simonyan; Raja Mazumder

Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations.


Database | 2016

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

Vahan Simonyan; Konstantin Chumakov; Hayley Dingerdissen; William J. Faison; Scott Goldweber; Anton Golikov; Naila Gulzar; Konstantinos Karagiannis; Phuc Vinh Nguyen Lam; Thomas Maudru; Olesja Muravitskaja; Ekaterina Osipova; Yang Pan; Alexey Pschenichnov; Alexandre Rostovtsev; Luis V. Santana-Quintero; Krista Smith; Elaine E. Thompson; Valery Tkachenko; John Torcivia-Rodriguez; Alin Voskanian; Quan Wan; Jing Wang; Tsung-Jung Wu; Carolyn A. Wilson; Raja Mazumder

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu


FEBS Journal | 2013

Proteome‐wide analysis of nonsynonymous single‐nucleotide variations in active sites of human proteins

Hayley Dingerdissen; Mona Motwani; Konstantinos Karagiannis; Vahan Simonyan; Raja Mazumder

An enzymes active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Nonsynonymous single‐nucleotide variations (nsSNVs), which alter the amino acid sequence, are one type of disruption that can alter the active site. When this occurs, it is assumed that enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active‐site‐impacting nsSNVs in the human genome and search for all pathways observed to be associated with this data set to assess the likely consequences. We find that there are 934 unique nsSNVs that occur at the active sites of 559 proteins. Analysis of the nsSNV data shows an over‐representation of arginine and an under‐representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV‐impacted active site residues with the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over‐ or under‐represented in the data set, with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation–substrate/product pairs that can be used in targeted metabolomics experiments to assay the effects of specific variations. In addition, we report the significant prevalence of aspartic acid to histidine variation in eight proteins associated with nine diseases including glycogen storage diseases, lacrimo‐auriculo‐dento‐digital syndrome, Parkinsons disease and several cancers.


BMC Bioinformatics | 2014

Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

Charles Cole; Konstantinos Krampis; Konstantinos Karagiannis; Jonas S. Almeida; William J. Faison; Mona Motwani; Quan Wan; Anton Golikov; Yang Pan; Vahan Simonyan; Raja Mazumder

BackgroundNext-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it.ResultsTo address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr).ConclusionsAvailability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.


Genomics, Proteomics & Bioinformatics | 2013

SNVDis: A proteome-wide analysis service for evaluating nsSNVs in protein functional sites and pathways

Konstantinos Karagiannis; Vahan Simonyan; Raja Mazumder

Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a protein- or gene-centric fashion. Unfortunately, proteome-wide analysis of non-synonymous single-nucleotide variations (nsSNVs) is not easy to perform because information on nsSNVs and functionally important sites are not well integrated both within and between databases and their search engines. We have developed SNVDis that allows evaluation of proteome-wide nsSNV distribution in functional sites, domains and pathways. More specifically, we have integrated human-specific data from major variation databases (UniProtKB, dbSNP and COSMIC), comprehensive sequence feature annotation from UniProtKB, Pfam, RefSeq, Conserved Domain Database (CDD) and pathway information from Protein ANalysis THrough Evolutionary Relationships (PANTHER) and mapped all of them in a uniform and comprehensive way to the human reference proteome provided by UniProtKB/Swiss-Prot. Integrated information of active sites, pathways, binding sites, domains, which are extracted from a number of different sources, provides a detailed overview of how nsSNVs are distributed over the human proteome and pathways and how they intersect with functional sites of proteins. Additionally, it is possible to find out whether there is an over- or under-representation of nsSNVs in specific domains, pathways or user-defined protein lists. The underlying datasets are updated once every 3 months. SNVDis is freely available at http://hive.biochemistry.gwu.edu/tool/snvdis.


Nucleic Acids Research | 2017

Separation and assembly of deep sequencing data into discrete sub-population genomes

Konstantinos Karagiannis; Vahan Simonyan; Konstantin Chumakov; Raja Mazumder

Abstract Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.


Genomics | 2017

HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls

Vahan Simonyan; Konstantin Chumakov; Eric Donaldson; Konstantinos Karagiannis; Phuc Vinh Nguyen Lam; Hayley Dingerdissen; Alin Voskanian

Advances in high-throughput sequencing (HTS) technologies have greatly increased the availability of genomic data and potential discovery of clinically significant genomic variants. However, numerous issues still exist with the analysis of these data, including data complexity, the absence of formally agreed upon best practices, and inconsistent reproducibility. Toward a more robust and reproducible variant-calling paradigm, we propose a series of selective noise filtrations and post-alignment quality control (QC) techniques that may reduce the rate of false variant calls. We have implemented both novel and refined post-alignment QC mechanisms to augment existing pre-alignment QC measures. These techniques can be used independently or in combination to identify and correct issues caused during data generation or early analysis stages. The adoption of these procedures by the broader scientific community is expected to improve the identification of clinically significant variants both in terms of computational efficiency and in the confidence of the results. AVAILABILITY https://hive.biochemistry.gwu.edu/.


PLOS Pathogens | 2018

Evolution of echovirus 11 in a chronically infected immunodeficient patient

Majid Laassri; Tatiana Zagorodnyaya; Sharon Hassin-Baer; Rachel Handsher; Danit Sofer; Merav Weil; Konstantinos Karagiannis; Vahan Simonyan; Konstantin Chumakov; Lester M. Shulman

Deep sequencing was used to determine complete nucleotide sequences of echovirus 11 (EV11) strains isolated from a chronically infected patient with CVID as well as from cases of acute enterovirus infection. Phylogenetic analysis showed that EV11 strains that circulated in Israel in 1980-90s could be divided into four clades. EV11 strains isolated from a chronically infected individual belonged to one of the four clades and over a period of 4 years accumulated mutations at a relatively constant rate. Extrapolation of mutations accumulation curve into the past suggested that the individual was infected with circulating EV11 in the first half of 1990s. Genomic regions coding for individual viral proteins did not appear to be under strong selective pressure except for protease 3C that was remarkably conserved. This may suggest its important role in maintaining persistent infection.


Archive | 2017

Analysis of HIV-1 quasispecies sequences generated by High Throughput Sequencing (HTS) using HIVE

Naila Gulzar; Bhavna Hora; Konstantinos Karagiannis; Krista Smith; Feng Gao; Raja Mazumder

Collaboration


Dive into the Konstantinos Karagiannis's collaboration.

Top Co-Authors

Avatar

Raja Mazumder

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Vahan Simonyan

Center for Biologics Evaluation and Research

View shared research outputs
Top Co-Authors

Avatar

Konstantin Chumakov

Center for Biologics Evaluation and Research

View shared research outputs
Top Co-Authors

Avatar

Hayley Dingerdissen

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Krista Smith

Center for Biologics Evaluation and Research

View shared research outputs
Top Co-Authors

Avatar

Naila Gulzar

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Phuc Vinh Nguyen Lam

Center for Biologics Evaluation and Research

View shared research outputs
Top Co-Authors

Avatar

Quan Wan

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Yang Pan

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Alin Voskanian

Center for Biologics Evaluation and Research

View shared research outputs
Researchain Logo
Decentralizing Knowledge