Timo Similä
Helsinki University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Timo Similä.
international conference on artificial neural networks | 2005
Timo Similä; Jarkko Tikka
Sparse regression is the problem of selecting a parsimonious subset of all available regressors for an efficient prediction of a target variable. We consider a general setting in which both the target and regressors may be multivariate. The regressors are selected by a forward selection procedure that extends the Least Angle Regression algorithm. Instead of the common practice of estimating each target variable individually, our proposed method chooses sequentially those regressors that allow, on average, the best predictions of all the target variables. We illustrate the procedure by an experiment with artificial data. The method is also applied to the task of selecting relevant pixels from images in multidimensional scaling of handwritten digits.
Computational Statistics & Data Analysis | 2007
Timo Similä; Jarkko Tikka
The regression problem of modeling several response variables using the same set of input variables is considered. The model is linearly parameterized and the parameters are estimated by minimizing the error sum of squares subject to a sparsity constraint. The constraint has the effect of eliminating useless inputs and constraining the parameters of the remaining inputs in the model. Two algorithms for solving the resulting convex cone programming problem are proposed. The first algorithm gives a pointwise solution, while the second one computes the entire path of solutions as a function of the constraint parameter. Based on experiments with real data sets, the proposed method has a similar performance to existing methods. In simulation experiments, the proposed method is competitive both in terms of prediction accuracy and correctness of input selection. The advantages become more apparent when many correlated inputs are available for model construction.
international joint conference on neural network | 2006
Timo Similä; Jarkko Tikka
We propose the multiresponse sparse regression algorithm, an input selection method for the purpose of estimating several response variables. It is a forward selection procedure for linearly parameterized models, which updates with carefully chosen step lengths. The step length rule extends the correlation criterion of the least angle regression algorithm for many responses. We present a general concept and explicit formulas for three different variants of the algorithm. Based on experiments with simulated data, the proposed method competes favorably with other methods when many correlated inputs are available for model construction. We also study the performance with several real data sets.
Pattern Recognition Letters | 2009
Timo Similä; Jarkko Tikka
Choosing a useful combination of input variables and an appropriate complexity of the model is an essential task in nonlinear regression analysis because of the risk of overfitting. This article provides a workable solution for the multilayer perceptron model. An initial structure of the model, including all the input variables, is fixed in the beginning. Only the most useful input variables and hidden nodes remain effective when the model is fitted with the proposed penalization method. The method is tested on three benchmark data sets. Experimental results show that the removal of useless input variables and hidden nodes from the model improves its generalization capability. In addition, the proposed method compares favorably with respect to other penalization methods.
Neural Processing Letters | 2013
Zhanxing Zhu; Timo Similä; Francesco Corona
In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This minimization of distance differences leads to the effect that the local geometrical structure of the low-dimensional subspace retrieved by the SDPP mimics that of the response space. This, not only facilitates an efficient regressor design but it also uncovers useful information for visualization. The SDPP achieves this goal by learning a linear parametric mapping and, thus, it can easily handle out-of-sample data points. For nonlinear data, a kernelized version of the SDPP is also derived. In addition, an intuitive extension of the SDPP is proposed to deal with classification problems. The experimental evaluation on both synthetic and real-world data sets demonstrates the effectiveness of the SDPP, showing that it performs comparably or superiorly to state-of-the-art approaches.
Information Visualization | 2005
Timo Similä
One of the main tasks in exploratory data analysis is to create an appropriate representation for complex data. In this paper, the problem of creating a representation for observations lying on a low-dimensional manifold embedded in high-dimensional coordinates is considered. We propose a modification of the Self-organizing map (SOM) algorithm that is able to learn the manifold structure in the high-dimensional observation coordinates. Any manifold learning algorithm may be incorporated to the proposed training strategy to guide the map onto the manifold surface instead of becoming trapped in local minima. In this paper, the Locally linear embedding algorithm is adopted. We use the proposed method successfully on several data sets with manifold geometry including an illustrative example of a surface as well as image data. We also show with other experiments that the advantage of the method over the basic SOM is restricted to this specific type of data.
international conference on acoustics, speech, and signal processing | 2007
Timo Similä
Multiresponse sparse regression is the problem of estimating many response variables using a common subset of input variables. Our model is linear, so row sparsity of the coefficient matrix implies subset selection. This is formulated as the problem of minimizing the residual sum of squares, where the row norms of the coefficient matrix are penalized. The proposed approach differs from existing ones in that any penalty function that is increasing, differentiable, and concave can be used. A convergent majorize-minimize algorithm is adopted for minimization. We also propose an active set strategy for tracking the nonzero rows of the coefficient matrix when the minimization is performed for a sequence of descending values of the penalty parameter. Numerical experiments are given to illustrate the active set strategy and analyze penalization with different degrees of concavity.
intelligent data engineering and automated learning | 2006
Risto M. Hakala; Timo Similä; Miki Sirola; Jukka Parviainen
The self-organizing map (SOM) [1] is used in data analysis for resolving and visualizing nonlinear relationships in complex data. This paper presents an application of the SOM for depicting state and progress of a real-time process. A self-organizing map is used as a visual regression model for estimating the state configuration and progress of an observation in process data. The proposed technique is used for examining full-scope nuclear power plant simulator data. One aim is to depict only the most relevant information of the process so that interpretating process behaviour would become easier for plant operators. In our experiments, the method was able to detect a leakage situation in an early stage and it was possible to observe how the system changed its state as time went on.
international conference on neural information processing | 2004
Sampsa Laine; Timo Similä
We propose a robust and understandable algorithm for supervised variable selection. The user defines a problem by manually selecting the variables Y that are used to train a Self-Organizing Map (SOM), which best describes the problem of interest. This is an illustrative problem definition even in multivariate case. The user also defines another set X, which contains variables that may be related to the problem. Our algorithm browses subsets of X and returns the one, which contains most information of the user’s problem. We measure information by mapping small areas of the studied subset to the SOM lattice. We return the variable set providing, on average, the most compact mapping. By analysis of public domain data sets and by comparison against other variable selection methods, we illustrate the main benefit of our method: understandability to the common user.
International Journal of Neural Systems | 2005
Timo Similä; Sampsa Laine
Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the users problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the users understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.