Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gad Abraham is active.

Publication


Featured researches published by Gad Abraham.


PLOS ONE | 2014

Fast Principal Component Analysis of Large-Scale Genome-Wide Data

Gad Abraham; Michael Inouye

Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.


Genetic Epidemiology | 2013

Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

Gad Abraham; Adam Kowalczyk; Justin Zobel; Michael Inouye

A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic‐net penalized support‐vector machine models, a mixed‐effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false‐positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome‐wide SNP profiles across eight complex diseases within cross‐validation, lasso and elastic‐net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohns disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.


BMC Bioinformatics | 2010

Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context

Gad Abraham; Adam Kowalczyk; Sherene Loi; Izhak Haviv; Justin Zobel

BackgroundDifferent microarray studies have compiled gene lists for predicting outcomes of a range of treatments and diseases. These have produced gene lists that have little overlap, indicating that the results from any one study are unstable. It has been suggested that the underlying pathways are essentially identical, and that the expression of gene sets, rather than that of individual genes, may be more informative with respect to prognosis and understanding of the underlying biological process.ResultsWe sought to examine the stability of prognostic signatures based on gene sets rather than individual genes. We classified breast cancer cases from five microarray studies according to the risk of metastasis, using features derived from predefined gene sets. The expression levels of genes in the sets are aggregated, using what we call a set statistic. The resulting prognostic gene sets were as predictive as the lists of individual genes, but displayed more consistent rankings via bootstrap replications within datasets, produced more stable classifiers across different datasets, and are potentially more interpretable in the biological context since they examine gene expression in the context of their neighbouring genes in the pathway. In addition, we performed this analysis in each breast cancer molecular subtype, based on ER/HER2 status. The prognostic gene sets found in each subtype were consistent with the biology based on previous analysis of individual genes.ConclusionsTo date, most analyses of gene expression data have focused at the level of the individual genes. We show that a complementary approach of examining the data using predefined gene sets can reduce the noise and could provide increased insight into the underlying biological pathways.


international conference of the ieee engineering in medicine and biology society | 2009

Short-Term Forecasting of Emergency Inpatient Flow

Gad Abraham; Graham Byrnes; Christopher Bain

Hospital managers have to manage resources effectively, while maintaining a high quality of care. For hospitals where admissions from the emergency department to the wards represent a large proportion of admissions, the ability to forecast these admissions and the resultant ward occupancy is especially useful for resource planning purposes. Since emergency admissions often compete with planned elective admissions, modeling emergency demand may result in improved elective planning as well. We compare several models for forecasting daily emergency inpatient admissions and occupancy. The models are applied to three years of daily data. By measuring their mean square error in a cross-validation framework, we find that emergency admissions are largely random, and hence, unpredictable, whereas emergency occupancy can be forecasted using a model combining regression and autoregressive integrated moving average (ARIMA) model, or a seasonal ARIMA model, for up to one week ahead. Faced with variable admissions and occupancy, hospitals must prepare a reserve capacity of beds and staff. Our approach allows estimation of the required reserve capacity.


European Heart Journal | 2016

Genomic prediction of coronary heart disease

Gad Abraham; Aki S. Havulinna; Oneil G. Bhalala; Sean G. Byars; Alysha M. De Livera; Laxman Yetukuri; Emmi Tikkanen; Markus Perola; Heribert Schunkert; Eric J.G. Sijbrands; Aarno Palotie; Nilesh J. Samani; Veikko Salomaa; Samuli Ripatti; Michael Inouye

Aims Genetics plays an important role in coronary heart disease (CHD) but the clinical utility of genomic risk scores (GRSs) relative to clinical risk scores, such as the Framingham Risk Score (FRS), is unclear. Our aim was to construct and externally validate a CHD GRS, in terms of lifetime CHD risk and relative to traditional clinical risk scores. Methods and results We generated a GRS of 49 310 SNPs based on a CARDIoGRAMplusC4D Consortium meta-analysis of CHD, then independently tested it using five prospective population cohorts (three FINRISK cohorts, combined n = 12 676, 757 incident CHD events; two Framingham Heart Study cohorts (FHS), combined n = 3406, 587 incident CHD events). The GRS was associated with incident CHD (FINRISK HR = 1.74, 95% confidence interval (CI) 1.61–1.86 per S.D. of GRS; Framingham HR = 1.28, 95% CI 1.18–1.38), and was largely unchanged by adjustment for known risk factors, including family history. Integration of the GRS with the FRS or ACC/AHA13 scores improved the 10 years risk prediction (meta-analysis C-index: +1.5–1.6%, P < 0.001), particularly for individuals ≥60 years old (meta-analysis C-index: +4.6–5.1%, P < 0.001). Importantly, the GRS captured substantially different trajectories of absolute risk, with men in the top 20% of attaining 10% cumulative CHD risk 12–18 y earlier than those in the bottom 20%. High genomic risk was partially compensated for by low systolic blood pressure, low cholesterol level, and non-smoking. Conclusions A GRS based on a large number of SNPs improves CHD risk prediction and encodes different trajectories of lifetime risk not captured by traditional clinical risk scores.


PLOS Genetics | 2014

Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning

Gad Abraham; Jason A. Tye-Din; Oneil G. Bhalala; Adam Kowalczyk; Justin Zobel; Michael Inouye

Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRSs utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.


Current Opinion in Genetics & Development | 2015

Genomic risk prediction of complex human disease and its clinical application

Gad Abraham; Michael Inouye

Recent advances in genome-wide association studies have stimulated interest in the genomic prediction of disease risk, potentially enabling individual-level risk estimates for early intervention and improved diagnostic procedures. Here, we review recent findings and approaches to genomic prediction model construction and performance, then contrast the potential benefits of such models in two complex human diseases, aiding diagnosis in celiac disease and prospective risk prediction for cardiovascular disease. Early indications are that optimal application of genomic risk scores will differ substantially for each disease depending on underlying genetic architecture as well as current clinical and public health practice. As costs decline, genomic profiles become common, and popular understanding of risk and its communication improves, genomic risk will become increasingly useful for the individual and the clinician.


BMC Bioinformatics | 2012

SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction

Gad Abraham; Adam Kowalczyk; Justin Zobel; Michael Inouye

BackgroundA central goal of genomics is to predict phenotypic variation from genetic variation. Fitting predictive models to genome-wide and whole genome single nucleotide polymorphism (SNP) profiles allows us to estimate the predictive power of the SNPs and potentially develop diagnostic models for disease. However, many current datasets cannot be analysed with standard tools due to their large size.ResultsWe introduce SparSNP, a tool for fitting lasso linear models for massive SNP datasets quickly and with very low memory requirements. In analysis on a large celiac disease case/control dataset, we show that SparSNP runs substantially faster than four other state-of-the-art tools for fitting large scale penalised models. SparSNP was one of only two tools that could successfully fit models to the entire celiac disease dataset, and it did so with superior performance. Compared with the other tools, the models generated by SparSNP had better than or equal to predictive performance in cross-validation.ConclusionsGenomic datasets are rapidly increasing in size, rendering existing approaches to model fitting impractical due to their prohibitive time or memory requirements. This study shows that SparSNP is an essential addition to the genomic analysis toolkit.SparSNP is available at http://www.genomics.csse.unimelb.edu.au/SparSNP


PLOS Genetics | 2017

Genetic loci associated with coronary artery disease harbor evidence of selection and antagonistic pleiotropy

Sean G. Byars; Qin Qin Huang; Lesley-Ann Gray; Andrew Bakshi; Samuli Ripatti; Gad Abraham; Stephen C. Stearns; Michael Inouye

Traditional genome-wide scans for positive selection have mainly uncovered selective sweeps associated with monogenic traits. While selection on quantitative traits is much more common, very few signals have been detected because of their polygenic nature. We searched for positive selection signals underlying coronary artery disease (CAD) in worldwide populations, using novel approaches to quantify relationships between polygenic selection signals and CAD genetic risk. We identified new candidate adaptive loci that appear to have been directly modified by disease pressures given their significant associations with CAD genetic risk. These candidates were all uniquely and consistently associated with many different male and female reproductive traits suggesting selection may have also targeted these because of their direct effects on fitness. We found that CAD loci are significantly enriched for lifetime reproductive success relative to the rest of the human genome, with evidence that the relationship between CAD and lifetime reproductive success is antagonistic. This supports the presence of antagonistic-pleiotropic tradeoffs on CAD loci and provides a novel explanation for the maintenance and high prevalence of CAD in modern humans. Lastly, we found that positive selection more often targeted CAD gene regulatory variants using HapMap3 lymphoblastoid cell lines, which further highlights the unique biological significance of candidate adaptive loci underlying CAD. Our study provides a novel approach for detecting selection on polygenic traits and evidence that modern human genomes have evolved in response to CAD-induced selection pressures and other early-life traits sharing pleiotropic links with CAD.


Bioinformatics | 2017

FlashPCA2: principal component analysis of Biobank-scale genotype datasets

Gad Abraham; Yixuan Qiu; Michael Inouye

Motivation: Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer feasible. However, when the full decomposition is not required, substantial computational savings can be made. Results: We present FlashPCA2, a tool that can perform partial PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory. Availability and implementation: https://github.com/gabraham/flashpca. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Collaboration


Dive into the Gad Abraham's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam Kowalczyk

Warsaw University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Justin Zobel

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aki S. Havulinna

National Institute for Health and Welfare

View shared research outputs
Top Co-Authors

Avatar

Markus Perola

National Institute for Health and Welfare

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Veikko Salomaa

National Institute for Health and Welfare

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge