Tapani Raiko | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tapani Raiko is active.

Explore More

Publication

Featured researches published by Tapani Raiko.

international conference on artificial neural networks | 2011

Improved learning of Gaussian-Bernoulli restricted Boltzmann machines

KyungHyun Cho; Alexander Ilin; Tapani Raiko

We propose a few remedies to improve training of Gaussian-Bernoulli restricted Boltzmann machines (GBRBM), which is known to be difficult. Firstly, we use a different parameterization of the energy function, which allows for more intuitive interpretation of the parameters and facilitates learning. Secondly, we propose parallel tempering learning for GBRBM. Lastly, we use an adaptive learning rate which is selected automatically in order to stabilize training. Our extensive experiments show that the proposed improvements indeed remove most of the difficulties encountered when training GBRBMs using conventional methods.

Journal of Artificial Intelligence Research | 2006

Logical hidden Markov models

Kristian Kersting; Luc De Raedt; Tapani Raiko

Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics.

european conference on machine learning | 2007

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Tapani Raiko; Alexander Ilin; Juha Karhunen

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

international symposium on neural networks | 2010

Parallel tempering is efficient for learning restricted Boltzmann machines

KyungHyun Cho; Tapani Raiko; Alexander Ilin

A new interest towards restricted Boltzmann machines (RBMs) has risen due to their usefulness in greedy learning of deep neural networks. While contrastive divergence learning has been considered an efficient way to learn an RBM, it has a drawback due to a biased approximation in the learning gradient. We propose to use an advanced Monte Carlo method called parallel tempering instead, and show experimentally that it works efficiently.

Neural Computation | 2013

Enhanced gradient for training restricted boltzmann machines

KyungHyun Cho; Tapani Raiko; Alexander Ilin

Restricted Boltzmann machines (RBMs) are often used as building blocks in greedy learning of deep networks. However, training this simple model can be laborious. Traditional learning algorithms often converge only with the right choice of metaparameters that specify, for example, learning rate scheduling and the scale of the initial weights. They are also sensitive to specific data representation. An equivalent RBM can be obtained by flipping some bits and changing the weights and biases accordingly, but traditional learning rules are not invariant to such transformations. Without careful tuning of these training settings, traditional algorithms can easily get stuck or even diverge. In this letter, we present an enhanced gradient that is derived to be invariant to bit-flipping transformations. We experimentally show that the enhanced gradient yields more stable training of RBMs both when used with a fixed learning rate and an adaptive one.

international symposium on neural networks | 2013

Gaussian-Bernoulli deep Boltzmann machine

Kyunghyun Cho; Tapani Raiko; Alexander Ilin

In this paper, we study a model that we call Gaussian-Bernoulli deep Boltzmann machine (GDBM) and discuss potential improvements in training the model. GDBM is designed to be applicable to continuous data and it is constructed from Gaussian-Bernoulli restricted Boltzmann machine (GRBM) by adding multiple layers of binary hidden neurons. The studied improvements of the learning algorithm for GDBM include parallel tempering, enhanced gradient, adaptive learning rate and layer-wise pretraining. We empirically show that they help avoid some of the common difficulties found in training deep Boltzmann machines such as divergence of learning, the difficulty in choosing right learning rate scheduling, and the existence of meaningless higher layers.

pacific symposium on biocomputing | 2002

Towards discovering structural signatures of protein folds based on logical hidden Markov models.

Kristian Kersting; Tapani Raiko; Stefan Kramer; Luc De Raedt

With the growing number of determined protein structures and the availability of classification schemes, it becomes increasingly important to develop computer methods that automatically extract structural signatures for classes of proteins. In this paper, we introduce and apply a new Machine Learning technique, Logical Hidden Markov Models (LOHMMs), to the task of finding structural signatures of folds according to the classification scheme SCOP. Our results indicate that LOHMMs are applicable to this task and possess several advantages over other approaches.

international conference on neural information processing | 2008

Natural Conjugate Gradient in Variational Inference

Antti Honkela; Matti Tornio; Tapani Raiko; Juha Karhunen

Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.

Neurocomputing | 2015

Self-organization and missing values in SOM and GTM

Tommi Vatanen; Maria Osmala; Tapani Raiko; Krista Lagus; Marko Sysi-Aho; Matej Orešič; Timo Honkela; Harri Lähdesmäki

In this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in the presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarantee good learning results with high-dimensional data. Initializing the GTM with the SOM is shown to yield improvements in self-organization with three high-dimensional data sets: commonly used MNIST and ISOLET data sets and epigenomic ENCODE data set. We also propose a revision of handling missing data to the batch SOM algorithm called the Imputation SOM and show that the new algorithm is more robust in the presence of missing data. We benchmark the performance of the topographic mappings in the missing value imputation task and conclude that there are better methods for this particular task. Finally, we announce a revised version of the SOM Toolbox for Matlab with added GTM functionality.

Neurocomputing | 2009

Variational Bayesian learning of nonlinear hidden state-space models for model predictive control

Tapani Raiko; Matti Tornio

This paper studies the identification and model predictive control in nonlinear hidden state-space models. Nonlinearities are modelled with neural networks and system identification is done with variational Bayesian learning. In addition to the robustness of control, the stochastic approach allows for various control schemes, including combinations of direct and indirect controls, as well as using probabilistic inference for control. We study the noise-robustness, speed, and accuracy of three different control schemes as well as the effect of changing horizon lengths and initialisation methods using a simulated cart-pole system. The simulations indicate that the proposed method is able to find a representation of the system state that makes control easier especially under high noise.

Explore More