Abhishek Niroula
Lund University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abhishek Niroula.
PLOS ONE | 2015
Abhishek Niroula; Siddhaling Urolagin; Mauno Vihinen
More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.
Human Mutation | 2016
Abhishek Niroula; Mauno Vihinen
Next‐generation sequencing methods have revolutionized the speed of generating variation information. Sequence data have a plethora of applications and will increasingly be used for disease diagnosis. Interpretation of the identified variants is usually not possible with experimental methods. This has caused a bottleneck that many computational methods aim at addressing. Fast and efficient methods for explaining the significance and mechanisms of detected variants are required for efficient precision/personalized medicine. Computational prediction methods have been developed in three areas to address the issue. There are generic tolerance (pathogenicity) predictors for filtering harmful variants. Gene/protein/disease‐specific tools are available for some applications. Mechanism and effect‐specific computer programs aim at explaining the consequences of variations. Here, we discuss the different types of predictors and their applications. We review available variation databases and prediction methods useful for variation interpretation. We discuss how the performance of methods is assessed and summarize existing assessment studies. A brief introduction is provided to the principles of the methods developed for variation interpretation as well as guidelines for how to choose the optimal tools and where the field is heading in the future.
Nucleic Acids Research | 2016
Abhishek Niroula; Mauno Vihinen
Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at http://structure.bmc.lu.se/PON-mt-tRNA/.
Human Mutation | 2015
Abhishek Niroula; Mauno Vihinen
Variations in mismatch repair (MMR) system genes are causative of Lynch syndrome and other cancers. Thousands of variants have been identified in MMR genes, but the clinical relevance is known for only a small proportion. Recently, the InSiGHT group classified 2,360 MMR variants into five classes. One‐third of variants, majority of which is nonsynonymous variants, remain to be of uncertain clinical relevance. Computational tools can be used to prioritize variants for disease relevance investigations. Previously, we classified 248 MMR variants as likely pathogenic and likely benign using PON‐MMR. We have developed a novel tool, PON‐MMR2, which is trained on a larger and more reliable dataset. In performance comparison, PON‐MMR2 outperforms both generic tolerance prediction methods as well as methods optimized for MMR variants. It achieves accuracy and MCC of 0.89 and 0.78, respectively, in cross‐validation and 0.86 and 0.69, respectively, on an independent test dataset. We classified 354 class 3 variants in InSiGHT database as well as all possible amino acid substitutions in four MMR proteins. Likely harmful variants mainly appear in the protein core, whereas likely benign variants are on the surface. PON‐MMR2 is a highly reliable tool to prioritize variants for functional analysis. It is freely available at http://structure.bmc.lu.se/PON‐MMR2/.
Human Mutation | 2017
Abhishek Niroula; Mauno Vihinen
Most diseases, including those of genetic origin, express a continuum of severity. Clinical interventions for numerous diseases are based on the severity of the phenotype. Predicting severity due to genetic variants could facilitate diagnosis and choice of therapy. Although computational predictions have been used as evidence for classifying the disease relevance of genetic variants, special tools for predicting disease severity in large scale are missing. Here, we manually curated a dataset containing variants leading to severe and less severe phenotypes and studied the abilities of variation impact predictors to distinguish between them. We found that these tools cannot separate the two groups of variants. Then, we developed a novel machine‐learning‐based method, PON‐PS (http://structure.bmc.lu.se/PON-PS), for the classification of amino acid substitutions associated with benign, severe, and less severe phenotypes. We tested the method using an independent test dataset and variants in four additional proteins. For distinguishing severe and nonsevere variants, PON‐PS showed an accuracy of 61% in the test dataset, which is higher than for existing tolerance prediction methods. PON‐PS is the first generic tool developed for this task. The tool can be used together with other evidence for improving diagnosis and prognosis and for prioritization of preventive interventions, clinical monitoring, and molecular tests.
Bioinformatics | 2016
Yang Yang; Abhishek Niroula; Bairong Shen; Mauno Vihinen
MOTIVATIONnSolubility is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced solubility and protein aggregation are also associated with many diseases.nnnRESULTSnWe collected from literature the largest experimentally verified solubility affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both solubility decreasing and increasing variants from those not affecting solubility. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to solubility and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.nnnAVAILABILITY AND IMPLEMENTATIONnPON-Sol is freely available at http://structure.bmc.lu.se/PON-Sol The training and test data are available at http://structure.bmc.lu.se/VariBench/[email protected] INFORMATIONnSupplementary data are available at Bioinformatics online.
BMC Medical Genomics | 2015
Abhishek Niroula; Mauno Vihinen
BackgroundCancer is characterized by the accumulation of large numbers of genetic variations and alterations of multiple biological phenomena. Cancer genomics has largely focused on the identification of such genetic alterations and the genes containing them, known as ‘cancer genes’. However, the non-functional somatic variations out-number functional variations and remain as a major challenge. Recurrent somatic variations are thought to be cancer drivers but they are present in only a small fraction of patients.MethodsWe performed an extensive analysis of amino acid substitutions (AASs) from 6,861 cancer samples (whole genome or exome sequences) classified into 30 cancer types and performed pathway enrichment analysis. We also studied the overlap between the cancers based on proteins containing harmful AASs and pathways affected by them.ResultsWe found that only a fraction of AASs (39.88xa0%) are harmful even in known cancer genes. In addition, we found that proteins containing harmful AASs in cancers are often centrally located in protein interaction networks. Based on the proteins containing harmful AASs, we identified significantly affected pathways in 28 cancer types and indicate that proteins containing harmful AASs can affect pathways despite the frequency of AASs in them. Our cross-cancer overlap analysis showed that it would be more beneficial to identify affected pathways in cancers rather than individual genes and variations.ConclusionPathways affected by harmful AASs reveal key processes involved in cancer development. Our approach filters out the putative benign AASs thus reducing the list of cancer variations allowing reliable identification of affected pathways. The pathways identified in individual cancer and overlap between cancer types open avenues for further experimental research and for developing targeted therapies and interventions.
Blood Advances | 2017
Britt-Marie Halvarsson; Anna-Karin Wihlborg; Mina Ali; Konstantinos Lemonakis; Ellinor Johnsson; Abhishek Niroula; Carrie Cibulskis; Niels Weinhold; Asta Försti; Evren Alici; Christian Langer; Michael Pfreundschuh; Hartmut Goldschmidt; Ulf-Henrik Mellqvist; Ingemar Turesson; Anders Waage; Kari Hemminki; Todd R. Golub; Hareth Nahi; Urban Gullberg; Markus Hansson; Björn Nilsson
Although common risk alleles for multiple myeloma (MM) were recently identified, their contribution to familial MM is unknown. Analyzing 38 familial cases identified primarily by linking Swedish nationwide registries, we demonstrate an enrichment of common MM risk alleles in familial compared with 1530 sporadic cases (P = 4.8 × 10-2 and 6.0 × 10-2, respectively, for 2 different polygenic risk scores) and 10u2009171 population-based controls (P = 1.5 × 10-4 and 1.3 × 10-4, respectively). Using mixture modeling, we estimate that about one-third of familial cases result from such enrichments. Our results provide the first direct evidence for a polygenic etiology in a familial hematologic malignancy.
Human Mutation | 2017
Roxana Daneshjou; Yanran Wang; Yana Bromberg; Samuele Bovo; Pier Luigi Martelli; Giulia Babbi; Pietro Di Lena; Rita Casadio; Matthew D. Edwards; David K. Gifford; David Jones; Laksshman Sundaram; Rajendra Rana Bhat; Xiaolin Li; Lipika R. Pal; Kunal Kundu; Yizhou Yin; John Moult; Yuxiang Jiang; Vikas Pejaver; Kymberleigh A. Pagel; Biao Li; Sean D. Mooney; Predrag Radivojac; Sohela Shah; Marco Carraro; Alessandra Gasparini; Emanuela Leonardi; Manuel Giollo; Carlo Ferrari
Precision medicine aims to predict a patients disease risk and best therapeutic options by using that individuals genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype–phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome‐sequencing data: Crohns disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohns disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype–phenotype relationships.
Human Mutation | 2017
Marco Carraro; Giovanni Minervini; Manuel Giollo; Yana Bromberg; Emidio Capriotti; Rita Casadio; Roland L. Dunbrack; Lisa Elefanti; P. Fariselli; Carlo Ferrari; Julian Gough; Panagiotis Katsonis; Emanuela Leonardi; Olivier Lichtarge; Chiara Menin; Pier Luigi Martelli; Abhishek Niroula; Lipika R. Pal; Susanna Repo; Maria Chiara Scaini; Mauno Vihinen; Qiong Wei; Qifang Xu; Yuedong Yang; Yizhou Yin; Jan Zaucha; Huiying Zhao; Yaoqi Zhou; Steven E. Brenner; John Moult
Correct phenotypic interpretation of variants of unknown significance for cancer‐associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next‐generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype–phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin‐dependent kinase inhibitor encoded by the CDKN2A gene. Twenty‐two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test‐set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.