Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sahir Bhatnagar is active.

Publication


Featured researches published by Sahir Bhatnagar.


Frontiers in Genetics | 2016

Gene Coexpression Analyses Differentiate Networks Associated with Diverse Cancers Harboring TP53 Missense or Null Mutations.

Kathleen Oros Klein; Karim Oualkacha; Marie-Hélène Lafond; Sahir Bhatnagar; Patricia N. Tonin; Celia M. T. Greenwood

In a variety of solid cancers, missense mutations in the well-established TP53 tumor suppressor gene may lead to the presence of a partially-functioning protein molecule, whereas mutations affecting the protein encoding reading frame, often referred to as null mutations, result in the absence of p53 protein. Both types of mutations have been observed in the same cancer type. As the resulting tumor biology may be quite different between these two groups, we used RNA-sequencing data from The Cancer Genome Atlas (TCGA) from four different cancers with poor prognosis, namely ovarian, breast, lung and skin cancers, to compare the patterns of coexpression of genes in tumors grouped according to their TP53 missense or null mutation status. We used Weighted Gene Coexpression Network analysis (WGCNA) and a new test statistic built on differences between groups in the measures of gene connectivity. For each cancer, our analysis identified a set of genes showing differential coexpression patterns between the TP53 missense- and null mutation-carrying groups that was robust to the choice of the tuning parameter in WGCNA. After comparing these sets of genes across the four cancers, one gene (KIR3DL2) consistently showed differential coexpression patterns between the null and missense groups. KIR3DL2 is known to play an important role in regulating the immune response, which is consistent with our observation that this genes strongly-correlated partners implicated many immune-related pathways. Examining mutation-type-related changes in correlations between sets of genes may provide new insight into tumor biology.


BMC Proceedings | 2016

Joint analysis of multiple blood pressure phenotypes in GAW19 data by using a multivariate rare-variant association test

Jianping Sun; Sahir Bhatnagar; Karim Oualkacha; Antonio Ciampi; Celia M. T. Greenwood

IntroductionLarge-scale sequencing studies often measure many related phenotypes in addition to the genetic variants. Joint analysis of multiple phenotypes in genetic association studies may increase power to detect disease-associated loci.MethodsWe apply a recently developed multivariate rare-variant association test to the Genetic Analysis Workshop 19 data in order to test associations between genetic variants and multiple blood pressure phenotypes simultaneously. We also compare this multivariate test with a widely used univariate test that analyzes phenotypes separately.ResultsThe multivariate test identified 2 genetic variants that have been previously reported as associated with hypertension or coronary artery disease. In addition, our region-based analyses also show that the multivariate test tends to give smaller p values than the univariate test.ConclusionsHence, the multivariate test has potential to improve test power, especially when multiple phenotypes are correlated.


BMC Proceedings | 2016

Assessing transmission ratio distortion in extended families: a comparison of analysis methods

Sahir Bhatnagar; Celia M. T. Greenwood; Aurelie Labbe

A statistical departure from Mendel’s law of segregation is known as transmission ratio distortion. Although well documented in many other organisms, the extent of transmission ratio distortion and its influence in the human genome remains incomplete. Using Genetic Analysis Workshop 19 whole genome sequence data from 20 large Mexican American pedigrees, our goal was to identify potentially distorted regions in the genome using family-based association methods such as the transmission disequilibrium test, the pedigree disequilibrium test, and the family-based association test. Preliminary results showed an unusually high number of transmission ratio distortion signals identified by the transmission disequilibrium test, but this phenomenon could not be replicated by the pedigree disequilibrium test or family-based association test. Applying these tests to different subsets of the data, we found the transmission disequilibrium test to be very sensitive to imputed genotypes. Regression analysis of transmission ratio distortion test p values controlling for minor allele frequency and quality control checks showed that Hardy Weinberg p values are associated with this inflation. Although the transmission disequilibrium test appears confounded by imputation of single nucleotide polymorphisms, the pedigree disequilibrium test and family-based association test seem to offer more robust alternatives when searching for transmission ratio distortion loci in whole genome sequence data from extended families.


bioRxiv | 2018

Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank

Ioanna Tachmazidou; Konstantinos Hatzikotoulas; Lorraine Southam; Jorge Esparza-Gordillo; Valeriia Haberland; Jie Zheng; Toby Johnson; Mine Koprulu; Eleni Zengini; Julia Steinberg; J.M. Wilkinson; Sahir Bhatnagar; Joshua Hoffman; Natalie Buchan; Daniel Suveges; Laura Yerges Armstrong; George Davey Smith; Tom R. Gaunt; Robert A. Scott; Linda C. McCarthy; Eleftheria Zeggini

Osteoarthritis is the most common musculoskeletal disease and the leading cause of disability globally. Here, we perform the largest genome-wide association study for osteoarthritis to date (77,052 cases and 378,169 controls), analysing 4 phenotypes: knee osteoarthritis, hip osteoarthritis, knee and/or hip osteoarthritis, and any osteoarthritis. We discover 64 signals, 52 of them novel, more than doubling the number of established disease loci. Six signals fine map to a single variant. We identify putative effector genes by integrating eQTL colocalization, fine-mapping, human rare disease, animal model, and osteoarthritis tissue expression data. We find enrichment for genes underlying monogenic forms of bone development diseases, and for the collagen formation and extracellular matrix organisation biological pathways. Ten of the likely effector genes, including TGFB1, FGF18, CTSK and IL11 have therapeutics approved or in clinical trials, with mechanisms of action supportive of evaluation for efficacy in osteoarthritis.


bioRxiv | 2018

Sparse additive interaction learning

Sahir Bhatnagar; Amanda Lovato; Yi Yang; Celia M. T. Greenwood

Diseases are now thought to be the result of changes in entire biological networks whose states are affected by a complex interaction of genetic and environmental factors. In general, power to estimate interactions is low, the number of possible interactions could be enormous and their effects may be non-linear. Existing approaches such as the lasso might keep an interaction but remove a main effect, which is problematic for interpretation. In this work, we introduce a sparse additive interaction learning model called sail for detecting non-linear interactions with a key environmental or exposure variable in high-dimensional settings. Our method can accommodate either the strong or weak heredity constraints. We develop a computationally efficient fitting algorithm with automatic tuning parameter selection, which scales to high-dimensional datasets. Through an extensive simulation study, we show that sail outperforms existing penalized regression methods in terms of prediction error, sensitivity and specificity when there are non-linear interactions with an exposure variable. We apply sail to the Alzheimers Disease Neuroimaging Initiative (ADNI) data to select non-linear interactions between clinical diagnosis and A{beta} protein in 96 brain regions on mini-mental state examination. Our algorithms are available in an R package (\url{https://github.com/greenwoodlab}, \url{https://sahirbhatnagar.com/sail/}).A conceptual paradigm for onset of a new disease is often considered to be the result of changes in entire biological networks whose states are affected by a complex interaction of genetic and environmental factors. However, when modelling a relevant phenotype as a function of high dimensional measurements, power to estimate inter-actions is low, the number of possible interactions could be enormous and their effects may be non-linear. Existing approaches for high dimensional modelling such as the lasso might keep an interaction but remove a main effect, which is problematic for interpretation. In this work, we introduce a method called sail for detecting non-linear interactions with a key environmental or exposure variable in high-dimensional settings which respects either the strong or weak heredity constraints. We prove that asymptotically, our method possesses the oracle property, i.e., it performs as well as if the true model were known in advance. We develop a computationally effcient fitting algorithm with automatic tuning parameter selection, which scales to high-dimensional datasets. Through an extensive simulation study, we show that sail out-performs existing penalized regression methods in terms of prediction accuracy and support recovery when there are non-linear interactions with an exposure variable. We then apply sail to detect non-linear interactions between genes and a prenatal psychosocial intervention program on cognitive performance in children at 4 years of age. Results from our method show that individuals who are genetically predisposed to lower educational attainment are those who stand to benefit the most from the intervention. Our algorithms are implemented in an R package available on CRAN (https://cran.r-project.org/package=sail).


bioRxiv | 2018

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

Sahir Bhatnagar; Karim Oualkacha; Yi Yang; Marie Forest; Celia M. T. Greenwood

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effect models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM framework called ggmix that simultaneously, in one step, selects variables and estimates their effects, while accounting for between individual correlations. Our method can accommodate several sparsity-inducing penalties such as the lasso, elastic net and group lasso, and also readily handles prior annotation information in the form of weights. We develop a blockwise coordinate descent algorithm which is highly scalable, computationally effcient and has theoretical guarantees of convergence. Through simulations, we show that ggmix leads to correct Type 1 error control and improved variance component estimation compared to the two-stage approach or principal component adjustment. ggmix is also robust to different kinship structures and heritability proportions. Our algorithms are available in an R package (https://github.com/greenwoodlab).Abstract Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM framework called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. Our method can accommodate several sparsity-inducing penalties such as the lasso, elastic net and group lasso, and also readily handles prior annotation information in the form of weights. We develop a blockwise coordinate descent algorithm which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and two real data examples, we show that ggmix leads to better sensitivity and specificity compared to the two-stage approach or principal component adjustment with better prediction accuracy. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package (https://github.com/greenwoodlab/ggmix). 1 Author Summary This work addresses a recurring challenge in the analysis and interpretation of genetic association studies: which genetic variants can best predict and are independently associated with a given phenotype in the presence of population structure ? Not controlling confounding due to geographic population structure, family and/or cryptic relatedness can lead to spurious associations. Much of the existing research has therefore focused on modeling the association between a phenotype and a single genetic variant in a linear mixed model with a random effect. However, this univariate approach may miss true associations due to the stringent significance thresholds required to reduce the number of false positives and also ignores the correlations between markers. We propose an alternative method for fitting high-dimensional multivariable models, which selects SNPs that are independently associated with the phenotype while also accounting for population structure. We provide an efficient implementation of our algorithm and show through simulation studies and real data examples that our method outperforms existing methods in terms of prediction accuracy and controlling the false discovery rate.


bioRxiv | 2018

Machine Learning to Predict Osteoporotic Fracture Risk from Genotypes

Vincenzo Forgetta; Julyan Keller-Baruch; Marie Forest; Audrey Durand; Sahir Bhatnagar; John P. Kemp; John A. Morris; John A. Kanis; Douglas P. Kiel; Eugene McCloskey; Fernando Rivadeneira; Helena Johannson; Nicholas C. Harvey; C Cooper; David Evans; Joelle Pineau; William D. Leslie; Celia M. T. Greenwood; J. Brent Richards

Background Genomics-based prediction could be useful since genome-wide genotyping costs less than many clinical tests. We tested whether machine learning methods could provide a clinically-relevant genomic prediction of quantitative ultrasound speed of sound (SOS)—a risk factor for osteoporotic fracture. Methods We used 341,449 individuals from UK Biobank with SOS measures to develop genomically-predicted SOS (gSOS) using machine learning algorithms. We selected the optimal algorithm in 5,335 independent individuals and then validated it and its ability to predict incident fracture in an independent test dataset (N = 80,027). Finally, we explored whether genomic pre-screening could complement a UK-based osteoporosis screening strategy, based on the validated tool FRAX. Results gSOS explained 4.8-fold more variance in SOS than FRAX clinical risk factors (CRF) alone (r2 = 23% vs. 4.8%). A standard deviation decrease in gSOS, adjusting for the CRF-FRAX score was associated with a higher increased odds of incident major osteoporotic fracture (1,491 cases / 78,536 controls, OR = 1.91 [1.70-2.14], P = 10-28) than that for measured SOS (OR = 1.60 [1.50-1.69], P = 10-52) and femoral neck bone mineral density (147 cases / 4,594 controls, OR = 1.53 [1.27-1.83], P = 10-6). Individuals in the bottom decile of the gSOS distribution had a 3.25-fold increased risk of major osteoporotic fracture (P = 10-18) compared to the top decile. A gSOS-based FRAX score, identified individuals at high risk for incident major osteoporotic fractures better than the CRF-FRAX score (P = 10-14). Introducing a genomic pre-screening step into osteoporosis screening in 4,741 individuals reduced the number of required clinical visits from 2,455 to 1,273 and the number of BMD tests from 1,013 to 473, while only reducing the sensitivity to identify individuals eligible for therapy from 99% to 95%. Interpretation The use of genotypes in a machine learning algorithm resulted in a clinically-relevant prediction of SOS and fracture, with potential to impact healthcare resource utilization. Research in Context Evidence Before this Study Genome-wide association studies have identified many loci associated with risk of clinically-relevant fracture risk factors, such as SOS. Yet, it is unclear if such information can be leveraged to identify those at risk for disease outcomes, such as osteoporotic fractures. Most previous attempts to predict disease risk from genotypes have used polygenic risk scores, which may not be optimal for genomic-prediction. Despite these obstacles, genomic-prediction could enable screening programs to be more efficient since most people screened in a population are not determined to have a level of risk that would prompt a change in clinical care. Genomic pre-screening could help identify individuals whose risk of disease is low enough that they are unlikely to benefit from screening. Added Value of this Study Using a large dataset of 426,811 individuals we trained and tested a machine learning algorithm to genomically-predict SOS. This metric, gSOS, had performance characteristics for predicting fracture risk that were similar to measured SOS and femoral neck BMD. Implementing a gSOS-based pre-screening step into the UK-based osteoporosis treatment guidelines reduced the number of individuals who would require screening clinical visits and skeletal testing by approximately 50%, while having little impact on the sensitivity to identify individuals at high risk for osteoporotic fracture. Implications of all of the Available Evidence Clinically-relevant genomic prediction of heritable traits is feasible using the machine learning algorithm presented here in large sample sizes. Genome-wide genotyping is now less expensive than many clinical tests, needs to be performed once over a lifetime and could risk stratify for multiple heritable traits and diseases years prior to disease onset, providing an opportunity for prevention. The implementation of such algorithms could improve screening efficiency, yet their cost-effectiveness will need to be ascertained in subsequent analyses.


Rheumatology | 2018

Widespread Epigenomic, Transcriptomic and Proteomic Differences Between Hip Osteophytic and Articular Patient Chondrocytes in Osteoarthritis

Julia Steinberg; Roger A. Brooks; Lorraine Southam; Sahir Bhatnagar; Theodoros Roumeliotis; Konstantinos Hatzikotoulas; Eleni Zengini; J.M. Wilkinson; Jyoti S. Choudhary; A. W. McCaskie; Eleftheria Zeggini

Abstract Objectives To identify molecular differences between chondrocytes from osteophytic and articular cartilage tissue from OA patients. Methods We investigated genes and pathways by combining genome-wide DNA methylation, RNA sequencing and quantitative proteomics in isolated primary chondrocytes from the cartilaginous layer of osteophytes and matched areas of low- and high-grade articular cartilage across nine patients with OA undergoing hip replacement surgery. Results Chondrocytes from osteophytic cartilage showed widespread differences to low-grade articular cartilage chondrocytes. These differences were similar to, but more pronounced than, differences between chondrocytes from osteophytic and high-grade articular cartilage, and more pronounced than differences between high- and low-grade articular cartilage. We identified 56 genes with significant differences between osteophytic chondrocytes and low-grade articular cartilage chondrocytes on all three omics levels. Several of these genes have known roles in OA, including ALDH1A2 and cartilage oligomeric matrix protein, which have functional genetic variants associated with OA from genome-wide association studies. An integrative gene ontology enrichment analysis showed that differences between osteophytic and low-grade articular cartilage chondrocytes are associated with extracellular matrix organization, skeletal system development, platelet aggregation and regulation of ERK1 and ERK2 cascade. Conclusion We present a first comprehensive view of the molecular landscape of chondrocytes from osteophytic cartilage as compared with articular cartilage chondrocytes from the same joints in OA. We found robust changes at genes relevant to chondrocyte function, providing insight into biological processes involved in osteophyte development and thus OA progression.


Journal of Gastrointestinal Surgery | 2018

A Comparison of Pathologic Outcomes of Open, Laparoscopic, and Robotic Resections for Rectal Cancer Using the ACS-NSQIP Proctectomy-Targeted Database: a Propensity Score Analysis

Richard Garfinkle; Maria Abou-Khalil; Sahir Bhatnagar; Nathalie Wong-Chong; Laurent Azoulay; Nancy Morin; Carol-Ann Vasilevsky; Marylise Boutros

BackgroundThere is ongoing debate regarding the benefits of minimally invasive techniques for rectal cancer surgery. The aim of this study was to compare pathologic outcomes of patients who underwent rectal cancer resection by open surgery, laparoscopy, and robotic surgery using the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) proctectomy-targeted database.MethodsAll patients from the 2016 ACS-NSQIP proctectomy-targeted database who underwent elective proctectomy for rectal cancer were identified. Patients were divided into three groups based on initial operative approach: open surgery, laparoscopy, and robotic surgery. Pathologic and 30-day clinical outcomes were then compared between the groups. A propensity score analysis was performed to control for confounders, and adjusted odds ratios for pathologic outcomes were reported.ResultsA total of 578 patients were included—211 (36.5%) in the open group, 213 (36.9%) in the laparoscopic group, and 154 (26.6%) in the robotic group. Conversion to open surgery was more common among laparoscopic cases compared to robotic cases (15.0% vs. 6.5%, respectively; p = 0.011). Positive circumferential resection margin (CRM) was observed in 4.7%, 3.8%, and 5.2% (p = 0.79) of open, laparoscopic, and robotic resections, respectively. Propensity score adjusted odds ratios for positive CRM (open surgery as a reference group) were 0.70 (0.26–1.85, p = 0.47) for laparoscopy and 1.03 (0.39–2.70, p = 0.96) for robotic surgery.ConclusionsThe use of minimally invasive surgical techniques for rectal cancer surgery does not appear to confer worse pathologic outcomes.


Genetic Epidemiology | 2018

An analytic approach for interpretable predictive models in high-dimensional data in the presence of interactions with exposures

Sahir Bhatnagar; Yi Yang; Budhachandra S. Khundrakpam; Alan C. Evans; Mathieu Blanchette; Luigi Bouchard; Celia M. T. Greenwood

Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two‐step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network‐altering effects, we explore whether the use of exposure‐dependent clustering relationships in dimension reduction can improve predictive modeling in a two‐step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.

Collaboration


Dive into the Sahir Bhatnagar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karim Oualkacha

Université du Québec à Montréal

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nancy Morin

Jewish General Hospital

View shared research outputs
Top Co-Authors

Avatar

Eleftheria Zeggini

Wellcome Trust Sanger Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge