Emil Eirola
Aalto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emil Eirola.
Neurocomputing | 2014
Emil Eirola; Amaury Lendasse; Vincent Vandewalle; Christophe Biernacki
Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.
Information Sciences | 2013
Emil Eirola; Gauthier Doquire; Michel Verleysen; Amaury Lendasse
The possibility of missing or incomplete data is often ignored when describing statistical or machine learning methods, but as it is a common problem in practice, it is relevant to consider. A popular strategy is to fill in the missing values by imputation as a pre-processing step, but for many methods this is not necessary, and can yield sub-optimal results. Instead, appropriately estimating pairwise distances in a data set directly enables the use of any machine learning methods using nearest neighbours or otherwise based on distances between samples. In this paper, it is shown how directly estimating distances tends to result in more accurate results than calculating distances from an imputed data set, and an algorithm to calculate the estimated distances is presented. The theoretical framework operates under the assumption of a multivariate normal distribution, but the algorithm is shown to be robust to violations of this assumption. The focus is on numerical data with a considerable proportion of missing values, and simulated experiments are provided to show accurate performance on several data sets.
international conference on artificial neural networks | 2013
Amaury Lendasse; Anton Akusok; Olli Simula; Francesco Corona; Mark van Heeswijk; Emil Eirola; Yoan Miche
In this paper is described the original (basic) Extreme Learning Machine (ELM). Properties like robustness and sensitivity to variable selection are studied. Several extensions of the original ELM are then presented and compared. Firstly, Tikhonov-Regularized Optimally-Pruned Extreme Learning Machine (TROP-ELM) is summarized as an improvement of the Optimally-Pruned Extreme Learning Machine (OP-ELM) in the form of a L2 regularization penalty applied within the OP-ELM. Secondly, a Methodology to Linearly Ensemble ELM (ELM-ELM) is presented in order to improve the performance of the original ELM. These methodologies (TROP-ELM and ELM-ELM) are tested against state of the art methods such as Support Vector Machines or Gaussian Processes and the original ELM and OP-ELM, on ten different data sets. A specific experiment to test the sensitivity of these methodologies to variable selection is also presented.
trust, security and privacy in computing and communications | 2015
Luiza Sayfullina; Emil Eirola; Dmitry Komashinsky; Paolo Palumbo; Yoan Miche; Amaury Lendasse; Juha Karhunen
According to a recent F-Secure report, 97% of mobile malware is designed for the Android platform which has a growing number of consumers. In order to protect consumers from downloading malicious applications, there should be an effective system of malware classification that can detect previously unseen viruses. In this paper, we present a scalable and highly accurate method for malware classification based on features extracted from Android application package (APK) files. We explored several techniques for tackling independence assumptions in Naive Bayes and proposed Normalized Bernoulli Naive Bayes classifier that resulted in an improved class separation and higher accuracy. We conducted a set of experiments on an up-to-date large dataset of APKs provided by F-Secure and achieved 0.1% false positive rate with overall accuracy of 91%.
international symposium on neural networks | 2014
Emil Eirola; Amaury Lendasse; Francesco Corona; Michel Verleysen
Feature selection is essential in many machine learning problem, but it is often not clear on which grounds variables should be included or excluded. This paper shows that the mean squared leave-one-out error of the first-nearest-neighbour estimator is effective as a cost function when selecting input variables for regression tasks. A theoretical analysis of the estimators properties is presented to support its use for feature selection. An experimental comparison to alternative selection criteria (including mutual information, least angle regression, and the RReliefF algorithm) demonstrates reliable performance on several regression tasks.
intelligent data analysis | 2013
Emil Eirola; Amaury Lendasse
Gaussian mixture models provide an appealing tool for time series modelling. By embedding the time series to a higher-dimensional space, the density of the points can be estimated by a mixture model. The model can directly be used for short-to-medium term forecasting and missing value imputation. The modelling setup introduces some restrictions on the mixture model, which when appropriately taken into account result in a more accurate model. Experiments on time series forecasting show that including the constraints in the training phase particularly reduces the risk of overfitting in challenging situations with missing values or a large number of Gaussian components.
pervasive technologies related to assistive environments | 2016
Kaj-Mikael Björk; Emil Eirola; Yoan Miche; Amaury Lendasse
In our ever more complex world, the field of analytics has dramatically increased its importance. Gut feeling is no longer sufficient in decision making, but intuition has to be combined with support from the huge amount of data available today. Even if the amount of data is enormous, the quality of the data is not always good. Problems arise in at least two situations: i) the data is imprecise by nature and ii) the data is incomplete (or there are missing parts in the data set). Both situations are problematic and need to be addressed appropriately. If these problems are solved, applications are to be found in various interesting fields. We aim at achieving significant methodology development as well as creative solutions in the domain of medicine, information systems and risk management. This paper sets focus especially on missing data problems in the field of medicine when presenting a new project in its very first phase.
pervasive technologies related to assistive environments | 2017
Anton Akusok; Emil Eirola; Kaj-Mikael Björk; Yoan Miche; Hans J. Johnson; Amaury Lendasse
This paper presents a novel procedure to train Extreme Learning Machine models on datasets with missing values. In effect, a separate model is learned to classify every sample in the test set, however, this is accomplished in an efficient manner which does not require accessing the training data repeatedly. Instead, a sparse structure is imposed on the input layer weights, which enables calculating the necessary statistics in the training phase. An application to predicting the progression of Huntingtons disease from brain scans is presented. Experimental comparisons show promising results equivalent to the state of the art in machine learning with incomplete data.
international symposium on neural networks | 2014
Emil Eirola; Amaury Lendasse; Juha Karhunen
Variable selection is a crucial part of building regression models, and is preferably done as a filtering method independently from the model training. Mutual information is a popular relevance criterion for this, but it is not trivial to estimate accurately from a limited amount of data. In this paper, a method is presented where a Gaussian mixture model is used to estimate the joint density of the input and output variables, and subsequently used to select the most relevant variables by maximising the mutual information which can be estimated using the model.
Neurocomputing | 2013
Qi Yu; Yoan Miche; Emil Eirola; Mark van Heeswijk; Eric Séverin; Amaury Lendasse