Petri Kontkanen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petri Kontkanen is active.

Explore More

Publication

Featured researches published by Petri Kontkanen.

Information Processing Letters | 2007

A linear-time algorithm for computing the multinomial stochastic complexity

Petri Kontkanen; Petri Myllymäki

The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing model class selection and other types of statistical inference. This framework can be applied for tasks such as data clustering, density estimation and image denoising. The MDL principle is formalized via the so-called normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. The codelength of a given sample of data under the NML distribution is called the stochastic complexity, which is the basis for MDL model class selection. Unfortunately, in the case of discrete data, straightforward computation of the stochastic complexity requires exponential time with respect to the sample size, since the definition involves an exponential sum over all the possible data samples of a fixed size. As a main contribution of this paper, we derive an elegant recursion formula which allows efficient computation of the stochastic complexity in the case of n observations of a single multinomial random variable with K values. The time complexity of the new method is O(n+K) as opposed to O(nlognlogK) obtained with the previous results.

personal, indoor and mobile radio communications | 2004

Topics in probabilistic location estimation in wireless networks

Petri Kontkanen; Petri Myllymäki; Teemu Roos; Henry Tirri; Kimmo Valtonen; Hannes Wettig

In this survey-style paper we demonstrate the usefulness of the probabilistic modelling framework in solving not only the actual positioning problem, but also many related problems involving issues like calibration, active learning, error estimation and tracking with history. We also point out some interesting links between positioning research done in the area of robotics and in the area of wireless radio networks.

Statistics and Computing | 2000

On predictive distributions and Bayesian networks

Petri Kontkanen; Petri Myllymäki; Tomi Silander; Henry Tirri; Peter Grünwald

In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanens new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffreys prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffreys prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.

Archive | 2005

Probabilistic Methods for Location Estimation in Wireless Networks

Petri Kontkanen; Petri Myllymäki; Teemu Roos; Henry Tirri; Kimmo Valtonen; Hannes Wettig

Probabilistic modeling techniques offer a unifying theoretical framework for solving the problems encountered when developing location-aware and location-sensitive applications in wireless radio networks. In this paper we demonstrate the usefulness of the probabilistic modelling framework in solving not only the actual location estimation (positioning) problem, but also many related problems involving pragmatically important issues like calibration, active learning, error estimation and tracking with history. Some interesting links between positioning research done in the area of robotics and in the area of wireless radio networks are also discussed.

Journal of Food Engineering | 2003

Novel computational tools in bakery process data analysis: a comparative study

Juho Rousu; Laura Flander; Marjaana Suutarinen; Karin Autio; Petri Kontkanen; Ari Rantanen

We studied the potential of various machine learning and statistical methods in the prediction of product quality in industrial bakery processes. The methods included classification and regression tree, decision list, neural network, support vector machine and Bayesian learning algorithms as well as statistical multivariate methods. Our data originated from two industrial bakery processes: a sourdough rye bread and a Danish pastry process. In our studies, the Naive Bayesian algorithm turned out to be the best classifier building algorithm while the partial least squares (PLS) method was the best regression method. The prediction accuracy of these models improved significantly by pruning the original set of variables. In this study, two response variables could be predicted on a level that justifies further study: rye bread pH could be predicted with high accuracy with Naive Bayesian Classifier, and Danish pastry height could be predicted with a moderately high correlation with PLS.

EWCBR '96 Proceedings of the Third European Workshop on Advances in Case-Based Reasoning | 1996

A Bayesian Framework for Case-Based Reasoning

Henry Tirri; Petri Kontkanen; Petri Myllymäki

In this paper we present a probabilistic framework for case-based reasoning in data-intensive domains, where only weak prior knowledge is available. In such a probabilistic viewpoint the attributes are interpreted as random variables, and the case base is used to approximate the underlying joint probability distribution of the attributes. Consequently structural case adaptation (and parameter adjustment in particular) can be viewed as prediction based on the full probability model constructed from the case history. The methodology addresses several problems encountered in building case-based reasoning systems. It provides a computationally efficient structural adaptation algorithm, avoids over-fitting by using Bayesian model selection and uses directly probabilities as measures of similarity. The methodology described has been implemented in the D-SIDE software package, and the approach is validated by presenting empirical results of the methods classification prediction performance for a set of public domain data sets.

Lecture Notes in Computer Science | 1998

On Bayesian Case Matching

Petri Kontkanen; Petri Myllymäki; Tomi Silander; Henry Tirri

Case retrieval is an important problem in several commercially significant application areas, such as industrial configuration and manufacturing problems. In this paper we extend the Bayesian probability theory based approaches to case-based reasoning, focusing on the case matching task, an essential part of any case retrieval system. Traditional approaches to the case matching problem typically rely on some distance measure, e.g., the Euclidean or Hamming distance, although there is no a priori guarantee that such measures really reflect the useful similarities and dissimilarities between the cases. One of the main advantages of the Bayesian framework for solving this problem is that it forces one to explicitly recognize all the assumptions made about the problem domain, which helps in analyzing the performance of the resulting system. As an example of an implementation of the Bayesian case matching approach in practice, we demonstrate how to construct a case retrieval system based on a set of independence assumptions between the domain variables. In the experimental part of the paper, the Bayesian case matching metric is evaluated empirically in a case-retrieval task by using public domain discrete real-world databases. The results suggest that case retrieval systems based on the Bayesian case matching score perform much better than case retrieval systems based on the standard Hamming distance similarity metrics.

information theory and applications | 2008

Bayesian network structure learning using factorized NML universal models

Teemu Roos; Tomi Silander; Petri Kontkanen; Petri Myllymäki

Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sum-product algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing state-of-the-art, even for small sample sizes.

knowledge discovery and data mining | 2000

Unsupervised Bayesian visualization of high-dimensional data

Petri Kontkanen; Jussi Lahtinen; Petri Myllymäki; Henry Tirri

We propose a data reduction method based on a probabilistic similarity framework where two vectors are considered similar if they lead to similar predictions. We show how this type of a probabilistic similarity metric can be de ned both in a supervised and unsupervised manner. As a concrete application of the suggested multidimensional scaling scheme, we describe how the method can be used for producing visual images of high-dimensional data, and give several examples of visualizations obtained by using the suggested scheme with probabilistic Bayesian network models.

european conference on machine learning | 1998

Bayes Optimal Instance-Based Learning

Petri Kontkanen; Petri Myllymäki; Tomi Silander; Henry Tirri

In this paper we present a probabilistic formalization of the instance-based learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a data-driven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) individual models. The general Bayesian instance-based learning framework described in this paper can be applied with any set of assumptions defining a parametric model family, and to any discrete prediction task where the number of simultaneously predicted attributes is small, which includes for example all classification tasks prevalent in the machine learning literature. To illustrate the use of the suggested general framework in practice, we show how the approach can be implemented in the special case with the strong independence assumptions underlying the so called Naive Bayes classifier. The resulting Bayesian instance-based classifier is validated empirically with public domain data sets and the results are compared to the performance of the traditional Naive Bayes classifier. The results suggest that the Bayesian instancebased learning approach yields better results than the traditional Naive Bayes classifier, especially in cases where the amount of the training data is small.

Explore More