Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mikko J. Sillanpää is active.

Publication


Featured researches published by Mikko J. Sillanpää.


Bioinformatics | 2004

BAPS 2: enhanced possibilities for the analysis of genetic population structure

Jukka Corander; Patrik Waldmann; Pekka Marttinen; Mikko J. Sillanpää

UNLABELLED Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset. AVAILABILITY Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.


Genome Research | 2010

Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities

Arttu Jolma; Teemu Kivioja; Jarkko Toivonen; Lu Cheng; Gong-Hong Wei; Martin Enge; Mikko Taipale; Juan M. Vaquerizas; Jian Yan; Mikko J. Sillanpää; Martin Bonke; Kimmo Palin; Shaheynoor Talukder; Timothy Hughes; Nicholas M. Luscombe; Esko Ukkonen; Jussi Taipale

The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.


Trends in Genetics | 2002

Model choice in gene mapping: what and why

Mikko J. Sillanpää; Jukka Corander

The choice of an appropriate genetic model describing the genetic architecture underlying a character of interest is an inherent part of the gene mapping studies of human and other living organisms. The genetic model specifies the statistical parameters for the number of genes, their positions, and the types and magnitudes of their contributions to the phenotype. There are many considerations involved in model formulation (choice) ranging from the assumptions concerning the data, the role of environment, and the number of oligogenes (or quantitative trait loci) influencing the trait behavior. There are several model selection procedures and criteria under specific sampling designs in the genetic literature. These approaches often have their origin in computer science or in general statistical theory. Our aim here is to give an overview of the most popular statistical criteria and to present principles behind them. Bayesian model averaging is suggested as a robust alternative for such methods.


Theoretical and Applied Genetics | 2002

Multiple QTL mapping in related plant populations via a pedigree-analysis approach

Marco C. A. M. Bink; Pekka Uimari; Mikko J. Sillanpää; L.L.G. Janss; Ritsert C. Jansen

Abstract.QTL mapping experiments in plant breeding may involve multiple populations or pedigrees that are related through their ancestors. These known relationships have often been ignored for the sake of statistical analysis, despite their potential increase in power of mapping. We describe here a Bayesian method for QTL mapping in complex plant populations and reported the results from its application to a (previously analysed) potato data set. This Bayesian method was originally developed for human genetics data, and we have proved that it is useful for complex plant populations as well, based on a sensitivity analysis that was performed here. The method accommodates robustness to complex structures in pedigree data, full flexibility in the estimation of the number of QTL across multiple chromosomes, thereby accounting for uncertainties in the transmission of QTL and marker alleles due to incomplete marker information, and the simultaneous inclusion of non-genetic factors affecting the quantitative trait.


Theoretical and Applied Genetics | 1997

Genetic basis of adaptation: flowering time in Arabidopsis thaliana

H. Kuittinen; Mikko J. Sillanpää; Outi Savolainen

Abstract We have mapped QTLs (quantitative trait loci) for an adaptive trait, flowering time, in a selfing annual, Arabidopsis thaliana. To obtain a mapping population we made a cross between an early-summer, annual strain, Li-5, and an individual from a late over-wintering natural population, Naantali. From the backcross to Li-5 298 progeny were grown, of which 93 of the most extreme individuals were genotyped. The data were analysed with both interval mapping and composite interval mapping methods to reveal one major and six minor QTLs, with at least one QTL on each of the five chromosomes. The QTL on chromosome 4 was a major one with an effect of 17.3 days on flowering time and explaining 53.4% of the total variance. The others had effects of at most 6.5 days, and they accounted for only small portions of the variance. Epistasis was indicated between one pair of the QTLs. The result of finding one major QTL and little epistasis agrees with previous studies on flowering time in Arabidopsis thaliana and other species. That several QTLs were found was expected considering the large number of possible candidate loci. In the light of the suggested genetic models of gene action at the candidate loci, epistasis was to be expected. The data showed that major QTLs for adaptive traits can be detected in non-domesticated species.


Annals of Human Genetics | 2004

Replication in genetic studies of complex traits

Mikko J. Sillanpää; Kari Auranen

Disappointments in replicating initial findings in gene mapping for complex traits are often attributed to small sample sizes and inadequate techniques to determine the threshold value. This is clearly not the whole truth. More fundamental reasons lie in the inherent heterogeneity related to disease, including genetic heterogeneity, differences in allele frequencies, and context‐dependency in genetic architecture. There are also other reasons related to the data collection and analysis. Replication may remain a source of frustration unless more emphasis is put on controlling these sources of heterogeneity between studies.


Heredity | 2011

Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses

Mikko J. Sillanpää

Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype–marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype–expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations.


Genetics | 2010

Extended Bayesian LASSO for Multiple Quantitative Trait Loci Mapping and Unobserved Phenotype Prediction

Crispin M. Mutshinda; Mikko J. Sillanpää

The Bayesian LASSO (BL) has been pointed out to be an effective approach to sparse model representation and successfully applied to quantitative trait loci (QTL) mapping and genomic breeding value (GBV) estimation using genome-wide dense sets of markers. However, the BL relies on a single parameter known as the regularization parameter to simultaneously control the overall model sparsity and the shrinkage of individual covariate effects. This may be idealistic when dealing with a large number of predictors whose effect sizes may differ by orders of magnitude. Here we propose the extended Bayesian LASSO (EBL) for QTL mapping and unobserved phenotype prediction, which introduces an additional level to the hierarchical specification of the BL to explicitly separate out these two model features. Compared to the adaptiveness of the BL, the EBL is “doubly adaptive” and thus, more robust to tuning. In simulations, the EBL outperformed the BL in regard to the accuracy of both effect size estimates and phenotypic value predictions, with comparable computational time. Moreover, the EBL proved to be less sensitive to tuning than the related Bayesian adaptive LASSO (BAL), which introduces locus-specific regularization parameters as well, but involves no mechanism for distinguishing between model sparsity and parameter shrinkage. Consequently, the EBL seems to point to a new direction for QTL mapping, phenotype prediction, and GBV estimation.


Theoretical and Applied Genetics | 2012

Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Zitong Li; Mikko J. Sillanpää

Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.


Theoretical and Applied Genetics | 2001

Bayesian versus frequentist analysis of multiple quantitative trait loci with an application to an outbred apple cross

Chris Maliepaard; Mikko J. Sillanpää; J. W. van Ooijen; Ritsert C. Jansen; Elja Arjas

Abstract Two methods, following different statistical paradigms for mapping multiple quantitative trait loci (QTLs), were compared: the first is a frequentist, the second a Bayesian approach. Both methods were applied to previously published experimental data from an outbred progeny of a single cross between two apple cultivars (Malus pumila Mill.). These approaches were compared with respect to (1) the models used, (2) the number of putative QTLs, (3) their estimated map positions and accuracies thereof and (4) the choice of cofactor markers. In general, the strongest evidence for QTLs, provided by both methods, was for the same linkage groups and for similar map positions. However, some differences were found with respect to evidence for QTLs on other linkage groups. The effect of using cofactor markers which were selected differently was also somewhat different.

Collaboration


Dive into the Mikko J. Sillanpää's collaboration.

Top Co-Authors

Avatar

Elja Arjas

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar

Zitong Li

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J. Juga

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Patrik Waldmann

Swedish University of Agricultural Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge