Petteri Sevon
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Petteri Sevon.
data integration in the life sciences | 2006
Petteri Sevon; Lauri Eronen; Petteri Hintsanen; Kimmo Kulovesi; Hannu Toivonen
Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2006
Petteri Sevon; Hannu Toivonen; Vesa Ollikainen
We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive
bioinformatics and bioengineering | 2000
Hannu Toivonen; Päivi Onkamo; Kari Vasko; Vesa Ollikainen; Petteri Sevon; Heikki Mannila; Juha Kere
Genetic markers are being increasingly utilized in gene mapping. The discovery of associations between markers and patient phenotypes - such as a disease status - enables the identification of potential disease gene loci. The rationale is that, in diseases with a reasonable genetic contribution, diseased individuals are more likely to have associated marker alleles near the disease susceptibility gene than control individuals. We describe a new gene mapping method-haplotype pattern mining (HPM) - that is based on discovering recurrent marker patterns. We define a class of useful haplotype patterns in genetic case-control data, give an algorithm for finding disease-associated haplotypes, and show how to use them to identify disease susceptibility loci. Experimental studies show that the method has good localization power in data sets with large degrees of phenocopies and with lots of missing and erroneous data. We also demonstrate how the method can be used to discover several genes simultaneously.
Journal of Medical Genetics | 2005
Petteri Hintsanen; Petteri Sevon; Päivi Onkamo; Lauri Eronen; Hannu Toivonen
Motivated by high throughput genotyping technology, our aim in this study was to experimentally compare the power and accuracy of case-control and family trio based approaches for haplotype based, large scale, association gene mapping. We compared trio based and case-control study designs in different disease models, and partitioned the performance differences into separate components: those from the sample ascertainment, the effective sample size, and the haplotyping approaches. For systematic and controlled tests, we simulated a rapidly expanding and relatively young isolated population. The experiments were also replicated with real asthma data. We used computationally efficient methods that scale up to large amounts of both markers and individuals. Mapping is based on a haplotype association test for haplotypes of 1–10 markers. For population based haplotype reconstruction, we use HaploRec, and compare it to both a simple trio based inference and true haplotypes. Firstly and surprisingly, statistically inferred population based haplotypes can be equally powerful as true haplotypes. Secondly, as expected, the effective sample size has a clear effect on both gene detection power and mapping accuracy. Thirdly, the sample ascertainment method does not have much effect on mapping accuracy. Finally, an interesting side result is that the simple haplotype association test clearly outperformed exhaustive allelic transmission disequilibrium tests. The results suggest that the case-control design is a powerful alternative to the more laborious family based ascertainment approach, especially for large datasets, and wherever population stratification can be controlled.
Journal of Integrative Bioinformatics | 2008
Petteri Sevon; Lauri Eronen
We describe a method for querying vertex- and edge-labeled graphs using context-free grammars to specify the class of interesting paths. We introduce a novel problem, finding the connection subgraph induced by the set of matching paths between given two vertices or two sets of vertices. Such a subgraph provides a concise summary of the relationship between the vertices. We also present novel algorithms for parsing subgraphs directly without enumerating all the individual paths. We evaluate experimentally the presented parsing algorithms on a set of real graphs derived from publicly available biomedical databases and on randomly generated graphs. The results indicate that parsing the connection subgraph directly is much more effective than parsing individual paths separately. Furthermore, we show that using a bidirectional parsing algorithm, in most cases, allows for searching twice as long paths as using a unidirectional search strategy.
advances in social networks analysis and mining | 2010
Petteri Hintsanen; Hannu Toivonen; Petteri Sevon
We present a novel and efficient algorithm, Path Covering, for solving the most reliable subgraph problem. A reliable subgraph gives a concise summary of the connectivity between two given individuals in a social network. Formally, the given network is seen as a Bernoulli random graph G, and the objective is to find a subgraph H with at most B edges such that the probability that a path exists in H between the given two individuals is maximized. The algorithm is based on an efficient stochastic search of candidate paths, and the use of Monte-Carlo simulation to cast the problem as a set cover problem. Experimental evaluation on real graphs derived from DBLP bibliography database indicates superior performance of the proposed algorithm.
knowledge discovery and data mining | 2001
Petteri Sevon; Hannu Toivonen; Vesa Ollikainen
We introduce and evaluate TreeDT, a novel gene mapping method which is based on discovering and assessing tree-like patterns in genetic marker data. Gene mapping aims at discovering a statistical connection from a particular disease or trait to a narrow region in the genome. In a typical case-control setting, data consists of genetic markers typed for a set of disease-associated chromosomes and a set of control chromosomes. A computer scientist would view this data as a set of strings.TreeDT extracts, essentially in the form of substrings and prefix trees, information about the historical recombinations in the population. This information is used to locate fragments potentially inherited from a common diseased founder, and to map the disease gene into the most likely such fragment. The method measures for each chromosomal location the disequilibrium of the prefix tree of marker strings starting from the location, to assess the distribution of disease-associated chromosomes.We evaluate experimentally the performance of TreeDT on realistic, simulated data sets, and comparisons to state of the art methods (TDT, HPM) show that TreeDT is very competitive.
Genetic Epidemiology | 2001
Petteri Sevon; Vesa Ollikainen; Päivi Onkamo; Hannu Toivonen; Heikki Mannila; Juha Kere
We used Haplotype Pattern Mining, HPM [Toivonen et al., Am J Hum Genet 67:133–45, 2000], for gene localization in Genetic Analysis Workshop (GAW) 12 isolate data. In HPM, association is analyzed by searching all trait‐associated haplotype patterns. Data mining algorithms are utilized to make the search efficient. The strength of the haplotype‐trait associations is measured by a linear model, into which a pre‐selected set of covariates is incorporated. Marker‐wise patterns of association are used for predicting the disease gene location. Genome‐wide scans of susceptibility genes for affection status as well as for the quantitative traits (Q1–Q5) were performed. First analyses were made with small sample sizes, 63–94 trios per trait, which is compared with a pilot study of a larger complex disease‐mapping project. Subsequently, the analysis was repeated with approximately 600 cases and 600 controls per trait to give higher power to the analyses. With small sample sizes, only the susceptibility genes having the strongest effects on the traits could be localized. The larger sample size gave very good results: all susceptibility genes, except one, could be correctly localized. First experiments on candidate genes suggested that HPM is applicable even to fine mapping of mutations in DNA sequence.
European Journal of Human Genetics | 2002
Anne Polvi; Tuomas T. T. Polvi; Petteri Sevon; Tuula Petäys; Tari Haahtela; Lauri A. Laitinen; Juha Kere; Tarja Laitinen
Chromosome 7p15-p14 showed genome-wide significant linkage to asthma related traits among the Finnish and French-Canadian families. As an essential step toward cloning the susceptibility gene, a detailed physical map of the region is needed. In this study we report a dense set of carefully tested, new microsatellite markers for fine mapping embedded in a continuous, easy-to-read, physical map of the region that includes the known genes and putative transcripts. Even though susceptibility genes for asthma are difficult to predict from a multitude of unknown genes mapped to the region, TCRG encoding the γ-chain of the heterodimeric γ/δ T cell receptor is a potential candidate. We present linkage and association results for TCRG in two independent Finnish family sets by using four highly polymorphic microsatellites spanning 169 kb across the locus. Linkage results confirmed our previous findings, but our study did not provide any evidence on behalf of a strong association of TCRG with either high serum total Immunoglobulin (IgE) level or asthma. Our results suggest that some other known or yet unidentified gene in the linkage region is the true asthma susceptibility gene.
data mining in bioinformatics | 2005
Petteri Sevon; Hannu Toivonen; Päivi Onkamo
The objective of gene mapping is to localize genes responsible for a particular disease or trait. We consider association-based gene mapping, where the data consist of markers genotyped for a sample of independent case and control individuals. In this chapter we give a generic framework for nonparametric gene mapping based on pattern discovery. We have previously introduced two instances of the framework: haplotype pattern mining (HPM) for case—control haplotype material and QHPM for quantitative trait and covariates. In our experiments, HPM has proven to be very competitive compared to other methods. Geneticists have found the output of HPM useful, and today HPM is routinely used for analyses by several research groups. We review these methods and present a novel instance, HPM-G, suitable for directly analyzing phase-unknown genotype data. Obtaining haplotypes is more costly than obtaining phase-unknown genotypes, and our experiments show that although larger samples are needed with HPMG, it is still in many cases more cost-effective than analysis with haplotype data.