Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jussi Gillberg is active.

Publication


Featured researches published by Jussi Gillberg.


Bioinformatics | 2014

Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression

Pekka Marttinen; Matti Pirinen; Antti-Pekka Sarin; Jussi Gillberg; Johannes Kettunen; Ida Surakka; Antti J. Kangas; Pasi Soininen; Paul F. O’Reilly; Marika Kaakinen; Mika Kähönen; Terho Lehtimäki; Mika Ala-Korpela; Olli T. Raitakari; Veikko Salomaa; Marjo-Riitta Järvelin; Samuli Ripatti; Samuel Kaski

Motivation: A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example, in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype–phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail owing to lack of power when the minor allele is present in only a limited number of individuals. Results: We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Because of the method’s ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2390 individuals from the Cardiovascular Risk in Young Finns study and 3659 individuals from the FINRISK study. Availability and implementation: R-code freely available for download at http://users.ics.aalto.fi/pemartti/gene_metabolome/. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Statistical Applications in Genetics and Molecular Biology | 2013

Genome-wide association studies with high-dimensional phenotypes

Pekka Marttinen; Jussi Gillberg; Aki S. Havulinna; Jukka Corander; Samuel Kaski

Abstract High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.


PLOS ONE | 2017

Cluster analysis to estimate the risk of preeclampsia in the high-risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study

Pia M. Villa; Pekka Marttinen; Jussi Gillberg; A. Inkeri Lokki; Kerttu Majander; Maija Riitta Ordén; Pekka Taipale; Anu-Katriina Pesonen; Katri Räikkönen; Esa Hämäläinen; Eero Kajantie; Hannele Laivuori

Objectives Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. Methods We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0–13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. Results The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7–11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1–60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1–60.6), intermediate onset (delivery between 34+0–36+6 weeks of gestation) to 25.1-fold (95%CI 3.1–79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0–52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1–3.6). Together with preeclampsia in an earlier pregnancy the risk increased to 11.4 (95%CI 4.5–20.9). Chronic hypertension (n = 60) increased the risk of preeclampsia 5.3-fold (95%CI 2.4–9.8), of severe preeclampsia 22.2-fold (95%CI 9.9–41.0), and risk of early-onset preeclampsia 16.7-fold (95%CI 2.0–57.6). If a woman had chronic hypertension combined with obesity, gestational diabetes and earlier preeclampsia, the risk of term preeclampsia increased 4.8-fold (95%CI 0.1–21.7). Women with type 1 diabetes mellitus had a high risk of all subgroups of preeclampsia. Conclusion The risk of preeclampsia increases exponentially with respect to the number of risk factors. Early-onset preeclampsia and severe preeclampsia have different risk profile from term preeclampsia.


Neurocomputing | 2013

Transfer learning using a nonparametric sparse topic model

Ali Faisal; Jussi Gillberg; Gayle Leen; Jaakko Peltonen

In many domains data items are represented by vectors of counts; count data arises, for example, in bioinformatics or analysis of text documents represented as word count vectors. However, often the amount of data available from an interesting data source is too small to model the data source well. When several data sets are available from related sources, exploiting their similarities by transfer learning can improve the resulting models compared to modeling sources independently. We introduce a Bayesian generative transfer learning model which represents similarity across document collections by sparse sharing of latent topics controlled by an Indian buffet process. Unlike a prominent previous model, hierarchical Dirichlet process (HDP) based multi-task learning, our model decouples topic sharing probability from topic strength, making sharing of low-strength topics easier. In experiments, our model outperforms the HDP approach both on synthetic data and in first of the two case studies on text collections, and achieves similar performance as the HDP approach in the second case study.


bioRxiv | 2017

Modelling G×E with historical weather information improves genomic prediction in new environments

Jussi Gillberg; Pekka Marttinen; Hiroshi Mamitsuka; Samuel Kaski

Interaction between the genotype and the environment (G×E) has a strong impact on the yield of major crop plants. Although influential, taking G×E explictily into account in plant breeding has remained difficult. Recently G×E has been predicted from environmental and genomic covariates, but existing works have not shown that generalization to new environments and years without access to in-season data is possible and practical applicability remains unclear. Using data from a Barley breeding program in Finland, we construct an in-silico experiment to study the viability of G×E prediction under practical constraints. We show that the response to the environment of a new generation of untested Barley cultivars can be predicted in new locations and years using genomic data, machine learning and historical weather observations for the new locations. Our results highlight the need for models of G×E: non-linear effects clearly dominate linear ones and the interaction between the soil type and daily rain is identified as the main driver for G×E for Barley in Finland. Our study implies that genomic selection can be used to capture the yield potential in G×E effects for future growth seasons, providing a possible means to achieve yield improvements, needed for feeding the growing population.


the european symposium on artificial neural networks | 2012

Sparse Nonparametric Topic Model for Transfer Learning

Ali Faisal; Jussi Gillberg; Jaakko Peltonen; Gayle Leen; Samuel Kaski


Journal of Machine Learning Research | 2016

Multiple output regression with latent noise

Jussi Gillberg; Pekka Marttinen; Matti Pirinen; Antti J. Kangas; Pasi Soininen; Mehreen Ali; Aki S. Havulinna; Marjo-Riitta Järvelin; Mika Ala-Korpela; Samuel Kaski


Archive | 2014

Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank

Matti Pirinen; Antti-Pekka Sarin; Jussi Gillberg; Johannes Kettunen; Samuli Ripatti; Samuel Kaski


WOS | 2017

Multiple Output Regression with Latent Noise

Jussi Gillberg; Pekka Marttinen; Matti Pirinen; Antti J. Kangas; Pasi Soininen; Mehreen Ali; Aki S. Havulinna; Marjo-Riitta Järvelin; Mika Ala-Korpela; Samuel Kaski


arXiv: Machine Learning | 2013

Bayesian Information Sharing Between Noise And Regression Models Improves Prediction of Weak Effects.

Jussi Gillberg; Pekka Marttinen; Matti Pirinen; Antti J. Kangas; Pasi Soininen; Marjo-Riitta Järvelin; Mika Ala-Korpela; Samuel Kaski

Collaboration


Dive into the Jussi Gillberg's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pekka Marttinen

Helsinki Institute for Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pasi Soininen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aki S. Havulinna

National Institute for Health and Welfare

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge