Rafal Kustra
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rafal Kustra.
Nature Genetics | 2007
Brent W. Zanke; Celia M. T. Greenwood; Jagadish Rangrej; Rafal Kustra; Albert Tenesa; Susan M. Farrington; James Prendergast; Sylviane Olschwang; Theodore Chiang; Edgar Crowdy; Vincent Ferretti; Philippe Laflamme; Saravanan Sundararajan; Stéphanie Roumy; Jean François Olivier; Frédérick Robidoux; Robert Sladek; Alexandre Montpetit; Peter J. Campbell; Stéphane Bézieau; Anne Marie O'Shea; George Zogopoulos; Michelle Cotterchio; Polly A. Newcomb; John R. McLaughlin; Ban Younghusband; Roger C. Green; Jane Green; Mary Porteous; Harry Campbell
Using a multistage genetic association approach comprising 7,480 affected individuals and 7,779 controls, we identified markers in chromosomal region 8q24 associated with colorectal cancer. In stage 1, we genotyped 99,632 SNPs in 1,257 affected individuals and 1,336 controls from Ontario. In stages 2–4, we performed serial replication studies using 4,024 affected individuals and 4,042 controls from Seattle, Newfoundland and Scotland. We identified one locus on chromosome 8q24 and another on 9p24 having combined odds ratios (OR) for stages 1–4 of 1.18 (trend; P = 1.41 × 10−8) and 1.14 (trend; P = 1.32 × 10−5), respectively. Additional analyses in 2,199 affected individuals and 2,401 controls from France and Europe supported the association at the 8q24 locus (OR = 1.16, trend; 95% confidence interval (c.i.): 1.07–1.26; P = 5.05 × 10−4). A summary across all seven studies at the 8q24 locus was highly significant (OR = 1.17, c.i.: 1.12–1.23; P = 3.16 × 10−11). This locus has also been implicated in prostate cancer.
Nature Genetics | 2008
Albert Tenesa; Susan M. Farrington; James Prendergast; Mary Porteous; Marion Walker; Naila Haq; Rebecca A. Barnetson; Evropi Theodoratou; Roseanne Cetnarskyj; Nicola Cartwright; Colin A. Semple; Andy Clark; Fiona Reid; Lorna Smith; Thibaud Koessler; Paul Pharoah; Stephan Buch; Clemens Schafmayer; Jürgen Tepel; Stefan Schreiber; Henry Völzke; Carsten Schmidt; Jochen Hampe; Jenny Chang-Claude; Michael Hoffmeister; Hermann Brenner; Stefan Wilkening; Federico Canzian; Gabriel Capellá; Victor Moreno
In a genome-wide association study to identify loci associated with colorectal cancer (CRC) risk, we genotyped 555,510 SNPs in 1,012 early-onset Scottish CRC cases and 1,012 controls (phase 1). In phase 2, we genotyped the 15,008 highest-ranked SNPs in 2,057 Scottish cases and 2,111 controls. We then genotyped the five highest-ranked SNPs from the joint phase 1 and 2 analysis in 14,500 cases and 13,294 controls from seven populations, and identified a previously unreported association, rs3802842 on 11q23 (OR = 1.1; P = 5.8 × 10−10), showing population differences in risk. We also replicated and fine-mapped associations at 8q24 (rs7014346; OR = 1.19; P = 8.6 × 10−26) and 18q21 (rs4939827; OR = 1.2; P = 7.8 × 10−28). Risk was greater for rectal than for colon cancer for rs3802842 (P < 0.008) and rs4939827 (P < 0.009). Carrying all six possible risk alleles yielded OR = 2.6 (95% CI = 1.75–3.89) for CRC. These findings extend our understanding of the role of common genetic variation in CRC etiology.
NeuroImage | 2000
Stephen C. Strother; Jon E. Anderson; Lars Kai Hansen; Ulrik Kjems; Rafal Kustra; John J. Sidtis; Sally Frutiger; Suraj Ashok Muley; Stephen M. LaConte; David A. Rottenberg
We introduce a data-analysis framework and performance metrics for evaluating and optimizing the interaction between activation tasks, experimental designs, and the methodological choices and tools for data acquisition, preprocessing, data analysis, and extraction of statistical parametric maps (SPMs). Our NPAIRS (nonparametric prediction, activation, influence, and reproducibility resampling) framework provides an alternative to simulations and ROC curves by using real PET and fMRI data sets to examine the relationship between prediction accuracy and the signal-to-noise ratios (SNRs) associated with reproducible SPMs. Using cross-validation resampling we plot training-test set predictions of the experimental design variables (e.g., brain-state labels) versus reproducibility SNR metrics for the associated SPMs. We demonstrate the utility of this framework across the wide range of performance metrics obtained from [(15)O]water PET studies of 12 age- and sex-matched data sets performing different motor tasks (8 subjects/set). For the 12 data sets we apply NPAIRS with both univariate and multivariate data-analysis approaches to: (1) demonstrate that this framework may be used to obtain reproducible SPMs from any data-analysis approach on a common Z-score scale (rSPM[Z]); (2) demonstrate that the histogram of a rSPM[Z] image may be modeled as the sum of a data-analysis-dependent noise distribution and a task-dependent, Gaussian signal distribution that scales monotonically with our reproducibility performance metric; (3) explore the relation between prediction and reproducibility performance metrics with an emphasis on bias-variance tradeoffs for flexible, multivariate models; and (4) measure the broad range of reproducibility SNRs and the significant influence of individual subjects. A companion paper describes learning curves for four of these 12 data sets, which describe an alternative mutual-information prediction metric and NPAIRS reproducibility as a function of training-set sizes from 2 to 18 subjects. We propose the NPAIRS framework as a validation tool for testing and optimizing methodological choices and tools in functional neuroimaging.
Nature Structural & Molecular Biology | 2012
Tarang Khare; Shraddha Pai; Karolis Koncevičius; Mrinal Pal; Edita Kriukiene; Zita Liutkeviciute; Manuel Irimia; Peixin Jia; Carolyn Ptak; Menghang Xia; Raymond Tice; Mamoru Tochigi; Solange Moréra; Anaies Nazarians; Denise D. Belsham; Albert H.C. Wong; Benjamin J. Blencowe; Sun Chong Wang; Philipp Kapranov; Rafal Kustra; Viviane Labrie; Saulius Klimašauskas; Arturas Petronis
The 5-methylcytosine (5-mC) derivative 5-hydroxymethylcytosine (5-hmC) is abundant in the brain for unknown reasons. Here we characterize the genomic distribution of 5-hmC and 5-mC in human and mouse tissues. We assayed 5-hmC by using glucosylation coupled with restriction-enzyme digestion and microarray analysis. We detected 5-hmC enrichment in genes with synapse-related functions in both human and mouse brain. We also identified substantial tissue-specific differential distributions of these DNA modifications at the exon-intron boundary in human and mouse. This boundary change was mainly due to 5-hmC in the brain but due to 5-mC in non-neural contexts. This pattern was replicated in multiple independent data sets and with single-molecule sequencing. Moreover, in human frontal cortex, constitutive exons contained higher levels of 5-hmC relative to alternatively spliced exons. Our study suggests a new role for 5-hmC in RNA splicing and synaptic function in the brain.
computer-based medical systems | 2006
Rafal Kustra; Adam Zagdanski
In this paper we consider a general framework for clustering expression data that permits integration of various biological data sources through combination of corresponding dissimilarity measures. In the paper we briefly review currently published attempts to genomic data fusion and discuss a problem of validating results from clustering expression data. We apply our approach to a real microarray expression dataset which induces a correlation-based dissimilarity matrix, and use gene ontology - biological process annotations to derive GO-based dissimilarity matrix. The proposed procedure is verified using a simple knowledge-based validation measure based on protein-protein interaction database. Obtained results reveal that combining experimental data with comprehensive and reliable biological repository may improve performance of cluster analysis and yield biologically meaningful gene clusters
IEEE Transactions on Medical Imaging | 2001
Rafal Kustra; Stephen C. Strother
The authors propose a flexible, comprehensive approach for analysis of [/sup 15/O]-water positron emission tomography (PET) brain images using a penalized version of linear discriminant analysis (PDA). They applied it to scans from 20 subjects (eight scans/subject) performing a finger movement task and analyzed: (1) two classes to obtain a covariance-normalized baseline-activation image, and (2) eight classes for the mean within subject temporal structure which contained baseline-activation and time-dependent changes in a two-dimensional canonical subspace. The authors imposed spatial smoothness on the resulting image(s) by expanding it in five tensor-product B-spline (TPS) bases of varying smoothness, and further regularized with a ridge-type penalty on the noise covariance matrix. The discrimination approach of PDA provides a probabilistic framework within which prediction error (PE) estimates are derived. The authors used these to optimize over TPS bases and a ridge hyperparameter (expressed as equivalent degrees of freedom, EDF). They obtained unbiased, low variance PE estimates using modern resampling tools (.632+ Bootstrap and cross validation), and compared PDA of (1) TPS-projected, mean-normalized and unnormalized scans and (2) mean-normalized scans with and without additional presmoothing. By examining the tradeoffs between PE and EDF, as a function of basis selection and image smoothing the authors demonstrate the utility of PDA, the PE framework, and the relationship between singular value decomposition and smooth TPS bases in the analysis of functional neuroimages.
BMC Bioinformatics | 2006
Rafal Kustra; Romy Shioda; Mu Zhu
BackgroundExpression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories.ResultsWe propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance.ConclusionOur factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions.
Biological Psychiatry | 2015
Gabriel Oh; Sun Chong Wang; Mrinal Pal; Zheng Fei Chen; Tarang Khare; Mamoru Tochigi; Catherine Ng; Yeqing A. Yang; Andrew Kwan; Zachary Kaminsky; Jonathan Mill; Cerisse Gunasinghe; Jennifer L. Tackett; Irving I. Gottesman; G. Willemsen; Eco J. de Geus; Jacqueline M. Vink; P. Eline Slagboom; Naomi R. Wray; Andrew C. Heath; Grant W. Montgomery; Gustavo Turecki; Nicholas G. Martin; Dorret I. Boomsma; Peter McGuffin; Rafal Kustra; Art Petronis
BACKGROUND Major depressive disorder (MDD) exhibits numerous clinical and molecular features that are consistent with putative epigenetic misregulation. Despite growing interest in epigenetic studies of psychiatric diseases, the methodologies guiding such studies have not been well defined. METHODS We performed DNA modification analysis in white blood cells from monozygotic twins discordant for MDD, in brain prefrontal cortex, and germline (sperm) samples from affected individuals and control subjects (total N = 304) using 8.1K CpG island microarrays and fine mapping. In addition to the traditional locus-by-locus comparisons, we explored the potential of new analytical approaches in epigenomic studies. RESULTS In the microarray experiment, we detected a number of nominally significant DNA modification differences in MDD and validated selected targets using bisulfite pyrosequencing. Some MDD epigenetic changes, however, overlapped across brain, blood, and sperm more often than expected by chance. We also demonstrated that stratification for disease severity and age may increase the statistical power of epimutation detection. Finally, a series of new analytical approaches, such as DNA modification networks and machine-learning algorithms using binary and quantitative depression phenotypes, provided additional insights on the epigenetic contributions to MDD. CONCLUSIONS Mapping epigenetic differences in MDD (and other psychiatric diseases) is a complex task. However, combining traditional and innovative analytical strategies may lead to identification of disease-specific etiopathogenic epimutations.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2010
Rafal Kustra; Adam Zagdanski
While clustering genes remains one of the most popular exploratory tools for expression data, it often results in a highly variable and biologically uninformative clusters. This paper explores a data fusion approach to clustering microarray data. Our method, which combined expression data and gene ontology (GO)-derived information, is applied on a real data set to perform genome-wide clustering. A set of novel tools is proposed to validate the clustering results and pick a fair value of infusion coefficient. These tools measure stability, biological relevance, and distance from the expression-only clustering solution. Our results indicate that a data-fusion clustering leads to more stable, biologically relevant clusters that are still representative of the experimental data.
European Journal of Cancer | 2012
Doris Howell; Amna Husain; Hsien Seow; Ying Liu; Rafal Kustra; Clare L. Atzema; Deborah Dudgeon; Craig C. Earle; Jonathan Sussman; Lisa Barbera
BACKGROUND Cluster identification has emerged as a priority for symptom research. Variation in statistical approaches has hampered the identification of common clusters that should be targeted for intervention. The purpose of this study was to identify and validate common symptom clusters in a large population-based cohort of ambulatory cancer subjects. METHODS This descriptive, factor analysis study used bootstrap methods to derive a stable factor structure to identify symptom clusters in a population-based sample of cancer patients. Subjects were identified from a provincial symptom database and linked to other provincial databases. Symptom clusters were validated using confirmatory factor analysis in a randomly selected portion of the sample and model fit examined using common goodness of fit criteria. RESULTS The cluster cohort included 14,247 subjects. Three symptom clusters were identified: fatigue-sickness symptoms (tiredness, nausea, drowsiness and shortness of breath), emotional distress (depression and anxiety), and a poor sense of well-being (appetite and well-being). These clusters were stable across most sub-populations in the cohort. CONCLUSION The identification of common symptom clusters using robust statistical methods will help to yield targets to improve symptom management and identify populations at risk for worse disease outcomes.