Nicolas Schilling
University of Hildesheim
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicolas Schilling.
knowledge discovery and data mining | 2014
Josif Grabocka; Nicolas Schilling; Martin Wistuba; Lars Schmidt-Thieme
Shapelets are discriminative sub-sequences of time series that best predict the target variable. For this reason, shapelet discovery has recently attracted considerable interest within the time-series research community. Currently shapelets are found by evaluating the prediction qualities of numerous candidates extracted from the series segments. In contrast to the state-of-the-art, this paper proposes a novel perspective in terms of learning shapelets. A new mathematical formalization of the task via a classification objective function is proposed and a tailored stochastic gradient learning algorithm is applied. The proposed method enables learning near-to-optimal shapelets directly without the need to try out lots of candidates. Furthermore, our method can learn true top-K shapelets by capturing their interaction. Extensive experimentation demonstrates statistically significant improvement in terms of wins and ranks against 13 baselines over 28 time-series datasets.
european conference on machine learning | 2015
Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme
The optimization of hyperparameters is often done manually or exhaustively but recent work has shown that automatic methods can optimize hyperparameters faster and even achieve better final performance. Sequential model-based optimization (SMBO) is the current state of the art framework for automatic hyperparameter optimization. Currently, it consists of three components: a surrogate model, an acquisition function and an initialization technique. We propose to add a fourth component, a way of pruning the hyperparameter search space which is a common way of accelerating the search in many domains but yet has not been applied to hyperparameter optimization. We propose to discard regions of the search space that are unlikely to contain better hyperparameter configurations by transferring knowledge from past experiments on other data sets as well as taking into account the evaluations already done on the current data set. Pruning as a new component for SMBO is an orthogonal contribution but nevertheless we compare it to surrogate models that learn across data sets and extensively investigate the impact of pruning with and without initialization for various state of the art surrogate models. The experiments are conducted on two newly created meta-data sets which we make publicly available. One of these meta-data sets is created on 59 data sets using 19 different classifiers resulting in a total of about 1.3 million experiments. This is by more than four times larger than all the results collaboratively collected by OpenML.
european conference on machine learning | 2015
Nicolas Schilling; Martin Wistuba; Lucas Drumond; Lars Schmidt-Thieme
In machine learning, hyperparameter optimization is a challenging task that is usually approached by experienced practitioners or in a computationally expensive brute-force manner such as grid-search. Therefore, recent research proposes to use observed hyperparameter performance on already solved problems (i.e. data sets) in order to speed up the search for promising hyperparameter configurations in the sequential model based optimization framework. In this paper, we propose multilayer perceptrons as surrogate models as they are able to model highly nonlinear hyperparameter response surfaces. However, since interactions of hyperparameters, data sets and metafeatures are only implicitly learned in the subsequent layers, we improve the performance of multilayer perceptrons by means of an explicit factorization of the interaction weights and call the resulting model a factorized multilayer perceptron. Additionally, we evaluate different ways of obtaining predictive uncertainty, which is a key ingredient for a decent tradeoff between exploration and exploitation. Our experimental results on two public meta data sets demonstrate the efficiency of our approach compared to a variety of published baselines. For reproduction purposes, we make our data sets and all the program code publicly available on our supplementary webpage.
ieee international conference on data science and advanced analytics | 2015
Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme
Hyperparameter optimization is often done manually or by using a grid search. However, recent research has shown that automatic optimization techniques are able to accelerate this optimization process and find hyperparameter configurations that lead to better models. Currently, transferring knowledge from previous experiments to a new experiment is of particular interest because it has been shown that it allows to further improve the hyperparameter optimization. We propose to transfer knowledge by means of an initialization strategy for hyperparameter optimization. In contrast to the current state of the art initialization strategies, our strategy is neither limited to hyperparameter configurations that have been evaluated on previous experiments nor does it need meta-features. The initial hyperparameter configurations are derived by optimizing for a meta-loss formally defined in this paper. This loss depends on the hyperparameter response function of the data sets that were investigated in past experiments. Since this function is unknown and only few observations are given, the meta-loss is not differentiable. We propose to approximate the response function by a differentiable plug-in estimator. Then, we are able to learn the initial hyperparameter configuration sequence by applying gradient-based optimization techniques. Extensive experiments are conducted on two meta-data sets. Our initialization strategy is compared to the state of the art for initialization strategies and further methods that are able to transfer knowledge between data sets. We give empirical evidence that our work provides an improvement over the state of the art.
ACM Transactions on Knowledge Discovery From Data | 2016
Josif Grabocka; Nicolas Schilling; Lars Schmidt-Thieme
Motifs are the most repetitive/frequent patterns of a time-series. The discovery of motifs is crucial for practitioners in order to understand and interpret the phenomena occurring in sequential data. Currently, motifs are searched among series sub-sequences, aiming at selecting the most frequently occurring ones. Search-based methods, which try out series sub-sequence as motif candidates, are currently believed to be the best methods in finding the most frequent patterns. However, this paper proposes an entirely new perspective in finding motifs. We demonstrate that searching is non-optimal since the domain of motifs is restricted, and instead we propose a principled optimization approach able to find optimal motifs. We treat the occurrence frequency as a function and time-series motifs as its parameters, therefore we learn the optimal motifs that maximize the frequency function. In contrast to searching, our method is able to discover the most repetitive patterns (hence optimal), even in cases where they do not explicitly occur as sub-sequences. Experiments on several real-life time-series datasets show that the motifs found by our method are highly more frequent than the ones found through searching, for exactly the same distance threshold.
conference on information and knowledge management | 2016
Nghia Duong-Trung; Nicolas Schilling; Lars Schmidt-Thieme
Previous research on content-based geolocation in general has developed prediction methods via conducting pre-partitioning and applying classification methods. The input of these methods is the concatenation of individual tweets during a period of time. But unfortunately, these methods have some drawbacks. They discard the natural real-values properties of latitude and longitude as well as fail to capture geolocation in near real-time. In this work, we develop a novel generative content-based regression model via a matrix factorization technique to tackle the near real-time geolocation prediction problem. With this model, we aim to address a couple of un-answered questions. First, we prove that near real-time geolocation prediction can be accomplished if we leave out the concatenation. Second, we account the real-values properties of physical coordinates within a regression solution. We apply our model on Twitter datasets as an example to prove the effectiveness and generality. Our experimental results show that the proposed model, in the best scenario, outperforms a set of state-of-the-art regression models including Support Vector Machines and Factorization Machines by a reduction of the median localization error up to 79%.
international conference on data mining | 2015
Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme
Hyperparameter tuning is often done manually but current research has proven that automatic tuning yields effective hyperparameter configurations even faster and does not require any expertise. To further improve the search, recent publications propose transferring knowledge from previous experiments to new experiments. We adapt the sequential model-based optimization by replacing its surrogate model and acquisition function with one policy that is optimized for the task of hyperparameter tuning. This policy generalizes over previous experiments but neither uses a model nor uses meta-features, nevertheless, outperforms the state of the art. We show that a static ranking of hyperparameter combinations yields competitive results and substantially outperforms a random hyperparameter search. Thus, it is a fast and easy alternative to complex hyperparameter tuning strategies and allows practitioners to tune their hyperparameters by simply using a look-up table. We made look-up tables for two classifiers publicly available: SVM and AdaBoost. Furthermore, we propose a similarity measure for data sets that yields more comprehensible results than those using meta-features. We show how this similarity measure can be applied to surrogate models in the SMBO framework and empirically show that this change leads to better hyperparameter configurations in less trials.
european conference on machine learning | 2016
Martin Wistuba; Nicolas Schilling; Lars Schmidt-Thieme
The choice of hyperparameters and the selection of algorithms is a crucial part in machine learning. Bayesian optimization methods have been used very successfully to tune hyperparameters automatically, in many cases even being able to outperform the human expert. Recently, these techniques have been massively improved by using meta-knowledge. The idea is to use knowledge of the performance of an algorithm on given other data sets to automatically accelerate the hyperparameter optimization for a new data set. In this work we present a model that transfers this knowledge in two stages. At the first stage, the function that maps hyperparameter configurations to hold-out validation performances is approximated for previously seen data sets. At the second stage, these approximations are combined to rank the hyperparameter configurations for a new data set. In extensive experiments on the problem of hyperparameter optimization as well as the problem of combined algorithm selection and hyperparameter optimization, we are outperforming the state of the art methods. The software related to this paper is available at https://github.com/wistuba/TST.
european conference on machine learning | 2016
Nicolas Schilling; Martin Wistuba; Lars Schmidt-Thieme
In machine learning, hyperparameter optimization is a challenging but necessary task that is usually approached in a computationally expensive manner such as grid-search. Out of this reason, surrogate based black-box optimization techniques such as sequential model-based optimization have been proposed which allow for a faster hyperparameter optimization. Recent research proposes to also integrate hyperparameter performances on past data sets to allow for a faster and more efficient hyperparameter optimization. In this paper, we use products of Gaussian process experts as surrogate models for hyperparameter optimization. Naturally, Gaussian processes are a decent choice as they offer good prediction accuracy as well as estimations about their uncertainty. Additionally, their hyperparameters can be tuned very effectively. However, in the light of large meta data sets, learning a single Gaussian process is not feasible as it involves inversion of a large kernel matrix. This directly limits their usefulness for hyperparameter optimization if large scale hyperparameter performances on past data sets are given. By using products of Gaussian process experts the scalability issues can be circumvened, however, this usually comes with the price of having less predictive accuracy. In our experiments, we show empirically that products of experts nevertheless perform very well compared to a variety of published surrogate models. Thus, we propose a surrogate model that performs as well as the current state of the art, is scalable to large scale meta knowledge, does not include hyperparameters itself and finally is even very easy to parallelize. The software related to this paper is available at https://github.com/nicoschilling/ECML2016.
Proceedings of the 3rd IKDD Conference on Data Science, 2016 | 2016
Mit Shah; Josif Grabocka; Nicolas Schilling; Martin Wistuba; Lars Schmidt-Thieme
Shapelets are discriminative patterns in time series, that best predict the target variable when their distances to the respective time series are used as features for a classifier. Since the shapelet is simply any time series of some length less than or equal to the length of the shortest time series in our data set, there is an enormous amount of possible shapelets present in the data. Initially, shapelets were found by extracting numerous candidates and evaluating them for their prediction quality. Then, Grabocka et al. [2] proposed a novel approach of learning time series shapelets called LTS. A new mathematical formalization of the task via a classification objective function was proposed and a tailored stochastic gradient learning was applied. It enabled learning near-to-optimal shapelets without the overhead of trying out lots of candidates. The Euclidean distance measure was used as distance metric in the proposed approach. As a limitation, it is not able to learn a single shapelet, that can be representative of different subsequences of time series, which are just warped along time axis. To consider these cases, we propose to use Dynamic Time Warping (DTW) as a distance measure in the framework of LTS. The proposed approach was evaluated on 11 real world data sets from the UCR repository and a synthetic data set created by ourselves. The experimental results show that the proposed approach outperforms the existing methods on these data sets.