Martin Wistuba | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Wistuba is active.

Explore More

Publication

Featured researches published by Martin Wistuba.

knowledge discovery and data mining | 2014

Learning time-series shapelets

Josif Grabocka; Nicolas Schilling; Martin Wistuba; Lars Schmidt-Thieme

Shapelets are discriminative sub-sequences of time series that best predict the target variable. For this reason, shapelet discovery has recently attracted considerable interest within the time-series research community. Currently shapelets are found by evaluating the prediction qualities of numerous candidates extracted from the series segments. In contrast to the state-of-the-art, this paper proposes a novel perspective in terms of learning shapelets. A new mathematical formalization of the task via a classification objective function is proposed and a tailored stochastic gradient learning algorithm is applied. The proposed method enables learning near-to-optimal shapelets directly without the need to try out lots of candidates. Furthermore, our method can learn true top-K shapelets by capturing their interaction. Extensive experimentation demonstrates statistically significant improvement in terms of wins and ranks against 13 baselines over 28 time-series datasets.

european conference on machine learning | 2015

Hyperparameter search space pruning - A new component for sequential model-based hyperparameter optimization

Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme

The optimization of hyperparameters is often done manually or exhaustively but recent work has shown that automatic methods can optimize hyperparameters faster and even achieve better final performance. Sequential model-based optimization (SMBO) is the current state of the art framework for automatic hyperparameter optimization. Currently, it consists of three components: a surrogate model, an acquisition function and an initialization technique. We propose to add a fourth component, a way of pruning the hyperparameter search space which is a common way of accelerating the search in many domains but yet has not been applied to hyperparameter optimization. We propose to discard regions of the search space that are unlikely to contain better hyperparameter configurations by transferring knowledge from past experiments on other data sets as well as taking into account the evaluations already done on the current data set. Pruning as a new component for SMBO is an orthogonal contribution but nevertheless we compare it to surrogate models that learn across data sets and extensively investigate the impact of pruning with and without initialization for various state of the art surrogate models. The experiments are conducted on two newly created meta-data sets which we make publicly available. One of these meta-data sets is created on 59 data sets using 19 different classifiers resulting in a total of about 1.3 million experiments. This is by more than four times larger than all the results collaboratively collected by OpenML.

european conference on machine learning | 2015

Hyperparameter Optimization with Factorized Multilayer Perceptrons

Nicolas Schilling; Martin Wistuba; Lucas Drumond; Lars Schmidt-Thieme

In machine learning, hyperparameter optimization is a challenging task that is usually approached by experienced practitioners or in a computationally expensive brute-force manner such as grid-search. Therefore, recent research proposes to use observed hyperparameter performance on already solved problems (i.e. data sets) in order to speed up the search for promising hyperparameter configurations in the sequential model based optimization framework. In this paper, we propose multilayer perceptrons as surrogate models as they are able to model highly nonlinear hyperparameter response surfaces. However, since interactions of hyperparameters, data sets and metafeatures are only implicitly learned in the subsequent layers, we improve the performance of multilayer perceptrons by means of an explicit factorization of the interaction weights and call the resulting model a factorized multilayer perceptron. Additionally, we evaluate different ways of obtaining predictive uncertainty, which is a key ingredient for a decent tradeoff between exploration and exploitation. Our experimental results on two public meta data sets demonstrate the efficiency of our approach compared to a variety of published baselines. For reproduction purposes, we make our data sets and all the program code publicly available on our supplementary webpage.

computational intelligence and games | 2012

Comparison of Bayesian move prediction systems for Computer Go

Martin Wistuba; Lars Schaefers; Marco Platzner

Since the early days of research on Computer Go, move prediction systems are an important building block for Go playing programs. Only recently, with the rise of Monte Carlo Tree Search (MCTS) algorithms, the strength of Computer Go programs increased immensely while move prediction remains to be an integral part of state of the art programs. In this paper we review three Bayesian move prediction systems that have been published in recent years and empirically compare them under equal conditions. Our experiments reveal that, given identical input data, the three systems can achieve almost identical prediction rates while differing substantially in their needs for computational and memory resources. From the analysis of our results, we are able to further improve the prediction rates for all three systems.

ieee international conference on data science and advanced analytics | 2015

Learning hyperparameter optimization initializations

Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme

Hyperparameter optimization is often done manually or by using a grid search. However, recent research has shown that automatic optimization techniques are able to accelerate this optimization process and find hyperparameter configurations that lead to better models. Currently, transferring knowledge from previous experiments to a new experiment is of particular interest because it has been shown that it allows to further improve the hyperparameter optimization. We propose to transfer knowledge by means of an initialization strategy for hyperparameter optimization. In contrast to the current state of the art initialization strategies, our strategy is neither limited to hyperparameter configurations that have been evaluated on previous experiments nor does it need meta-features. The initial hyperparameter configurations are derived by optimizing for a meta-loss formally defined in this paper. This loss depends on the hyperparameter response function of the data sets that were investigated in past experiments. Since this function is unknown and only few observations are given, the meta-loss is not differentiable. We propose to approximate the response function by a differentiable plug-in estimator. Then, we are able to learn the initial hyperparameter configuration sequence by applying gradient-based optimization techniques. Extensive experiments are conducted on two meta-data sets. Our initialization strategy is compared to the state of the art for initialization strategies and further methods that are able to transfer knowledge between data sets. We give empirical evidence that our work provides an improvement over the state of the art.

international conference on tools with artificial intelligence | 2013

Factorized Decision Trees for Active Learning in Recommender Systems

Rasoul Karimi; Martin Wistuba; Alexandros Nanopoulos; Lars Schmidt-Thieme

A key challenge in recommender systems is how to profile new users. A well-known solution for this problem is to use active learning techniques and ask the new user to rate a few items to reveal her preferences. The sequence of queries should not be static, i.e in each step the best query depends on the responses of the new user to the previous queries. Decision trees have been proposed to capture the dynamic aspect of this process. In this paper we improve decision trees in two ways. First, we propose the Most Popular Sampling (MPS) method to increase the speed of the tree construction. In each node, instead of checking all candidate items, only those which are popular among users associated with the node are examined. Second, we develop a new algorithm to build decision trees. It is called Factorized Decision Trees (FDT) and exploits matrix factorization to predict the ratings at nodes of the tree. The experimental results on the Netflix dataset show that both contributions are successful. The MPS increases the speed of the tree construction without harming the accuracy. And FDT improves the accuracy of rating predictions especially in the last queries.

pacific-asia conference on knowledge discovery and data mining | 2017

Personalized Deep Learning for Tag Recommendation

Hanh T. H. Nguyen; Martin Wistuba; Josif Grabocka; Lucas Drumond; Lars Schmidt-Thieme

Social media services deploy tag recommendation systems to facilitate the process of tagging objects which depends on the information of both the user’s preferences and the tagged object. However, most image tag recommender systems do not consider the additional information provided by the uploaded image but rely only on textual information, or make use of simple low-level image features. In this paper, we propose a personalized deep learning approach for the image tag recommendation that considers the user’s preferences, as well as visual information. We employ Convolutional Neural Networks (CNNs), which already provide excellent performance for image classification and recognition, to obtain visual features from images in a supervised way. We provide empirical evidence that features selected in this fashion improve the capability of tag recommender systems, compared to the current state of the art that is using hand-crafted visual features, or is solely based on the tagging history information. The proposed method yields up to at least two percent accuracy improvement in two real world datasets, namely NUS-WIDE and Flickr-PTR.

Knowledge and Information Systems | 2016

Fast classification of univariate and multivariate time series through shapelet discovery

Josif Grabocka; Martin Wistuba; Lars Schmidt-Thieme

Time-series classification is an important problem for the data mining community due to the wide range of application domains involving time-series data. A recent paradigm, called shapelets, represents patterns that are highly predictive for the target variable. Shapelets are discovered by measuring the prediction accuracy of a set of potential (shapelet) candidates. The candidates typically consist of all the segments of a dataset; therefore, the discovery of shapelets is computationally expensive. This paper proposes a novel method that avoids measuring the prediction accuracy of similar candidates in Euclidean distance space, through an online clustering/pruning technique. In addition, our algorithm incorporates a supervised shapelet selection that filters out only those candidates that improve classification accuracy. Empirical evidence on 45 univariate datasets from the UCR collection demonstrates that our method is 3–4 orders of magnitudes faster than the fastest existing shapelet discovery method, while providing better prediction accuracy. In addition, we extended our method to multivariate time-series data. Runtime results over four real-life multivariate datasets indicate that our method can classify MB-scale data in a matter of seconds and GB-scale data in a matter of minutes. The achievements do not compromise quality; on the contrary, our method is even superior to the multivariate baseline in terms of classification accuracy.

Annual Conference on Artificial Intelligence | 2013

Move Prediction in Go – Modelling Feature Interactions Using Latent Factors

Martin Wistuba; Lars Schmidt-Thieme

Move prediction systems have always been part of strong Go programs. Recent research has revealed that taking interactions between features into account improves the performance of move predictions. In this paper, a factorization model is applied and a supervised learning algorithm, Latent Factor Ranking (LFR), which enables to consider these interactions, is introduced. Its superiority will be demonstrated in comparison to other state-of-the-art Go move predictors. LFR improves accuracy by 3% over current state-of-the-art Go move predictors on average and by 5% in the middle- and endgame of a game. Depending on the dimensionality of the shared, latent factor vector, an overall accuracy of over 41% is achieved.

international conference on data mining | 2015

Sequential Model-Free Hyperparameter Tuning

Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme

Hyperparameter tuning is often done manually but current research has proven that automatic tuning yields effective hyperparameter configurations even faster and does not require any expertise. To further improve the search, recent publications propose transferring knowledge from previous experiments to new experiments. We adapt the sequential model-based optimization by replacing its surrogate model and acquisition function with one policy that is optimized for the task of hyperparameter tuning. This policy generalizes over previous experiments but neither uses a model nor uses meta-features, nevertheless, outperforms the state of the art. We show that a static ranking of hyperparameter combinations yields competitive results and substantially outperforms a random hyperparameter search. Thus, it is a fast and easy alternative to complex hyperparameter tuning strategies and allows practitioners to tune their hyperparameters by simply using a look-up table. We made look-up tables for two classifiers publicly available: SVM and AdaBoost. Furthermore, we propose a similarity measure for data sets that yields more comprehensible results than those using meta-features. We show how this similarity measure can be applied to surrogate models in the SMBO framework and empirically show that this change leads to better hyperparameter configurations in less trials.

Explore More