Joaquin Vanschoren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joaquin Vanschoren is active.

Explore More

Publication

Featured researches published by Joaquin Vanschoren.

Sigkdd Explorations | 2014

OpenML: networked science in machine learning

Joaquin Vanschoren; Jan N. van Rijn; Bernd Bischl; Luís Torgo

Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems. We discuss how OpenML relates to other examples of networked science and what benefits it brings for machine learning research, individual scientists, as well as students and practitioners.

Machine Learning | 2012

Experiment databases

Joaquin Vanschoren; Hendrik Blockeel; Bernhard Pfahringer; Geoffrey Holmes

Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies.

ACM Computing Surveys | 2013

A survey of intelligent assistants for data analysis

Floarea Serban; Joaquin Vanschoren; Jörg-Uwe Kietz; Abraham Bernstein

Research and industry increasingly make use of large amounts of data to guide decision-making. To do this, however, data needs to be analyzed in typically nontrivial refinement processes, which require technical expertise about methods and algorithms, experience with how a precise analysis should proceed, and knowledge about an exploding number of analytic approaches. To alleviate these problems, a plethora of different systems have been proposed that “intelligently” help users to analyze their data. This article provides a first survey to almost 30 years of research on intelligent discovery assistants (IDAs). It explicates the types of help IDAs can provide to users and the kinds of (background) knowledge they leverage to provide this help. Furthermore, it provides an overview of the systems developed over the past years, identifies their most important features, and sketches an ideal future IDA as well as the challenges on the road ahead.

european conference on principles of data mining and knowledge discovery | 2007

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Hendrik Blockeel; Joaquin Vanschoren

Machine learning research often has a large experimental component. While the experimental methodology employed in machine learning has improved much over the years, repeatability of experiments and generalizability of results remain a concern. In this paper we propose a methodology based on the use of experiment databases. Experiment databases facilitate large-scale experimentation, guarantee repeatability of experiments, improve reusability of experiments, help explicitating the conditions under which certain results are valid, and support quick hypothesis testing as well as hypothesis generation. We show that they have the potential to significantly increase the ease with which new results in machine learning can be obtained and correctly interpreted.

Lecture Notes in Computer Science | 2013

OpenML: A collaborative science platform

Jan N. van Rijn; Bernd Bischl; Luís Torgo; Bo Gao; Venkatesh Umaashankar; Simon Fischer; Patrick Winter; Bernd Wiswedel; Michael R. Berthold; Joaquin Vanschoren

european conference on machine learning | 2013

OpenML: a collaborative science platform

Jan N. van Rijn; Bernd Bischl; Luís Torgo; Bo Gao; Venkatesh Umaashankar; Simon Fischer; Patrick Winter; Bernd Wiswedel; Michael R. Berthold; Joaquin Vanschoren

We present OpenML, a novel open science platform that provides easy access to machine learning data, software and results to encourage further study and application. It organizes all submitted results online so they can be easily found and reused, and features a web API which is being integrated in popular machine learning tools such as Weka, KNIME, RapidMiner and R packages, so that experiments can be shared easily.

intelligent data analysis | 2015

Fast Algorithm Selection Using Learning Curves

Jan N. van Rijn; Salisu Mamman Abdulrahman; Pavel Brazdil; Joaquin Vanschoren

One of the challenges in Machine Learning to find a classifier and parameter settings that work well on a given dataset. Evaluating all possible combinations typically takes too much time, hence many solutions have been proposed that attempt to predict which classifiers are most promising to try. As the first recommended classifier is not always the correct choice, multiple recommendations should be made, making this a ranking problem rather than a classification problem. Even though this is a well studied problem, there is currently no good way of evaluating such rankings. We advocate the use of Loss Time Curves, as used in the optimization literature. These visualize the amount of budget (time) needed to converge to a acceptable solution. We also investigate a method that utilizes the measured performances of classifiers on small samples of data to make such recommendation, and adapt it so that it works well in Loss Time space. Experimental results show that this method converges extremely fast to an acceptable solution.

discovery science | 2014

Algorithm Selection on Data Streams

Jan N. van Rijn; Geoffrey Holmes; Bernhard Pfahringer; Joaquin Vanschoren

We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability.

european conference on machine learning | 2012

MDL-based analysis of time series at multiple time-scales

Ugo Vespier; Arno Knobbe; Siegfried Nijssen; Joaquin Vanschoren

The behavior of many complex physical systems is affected by a variety of phenomena occurring at different temporal scales. Time series data produced by measuring properties of such systems often mirrors this fact by appearing as a composition of signals across different time scales. When the final goal of the analysis is to model the individual phenomena affecting a system, it is crucial to be able to recognize the right temporal scales and to separate the individual components of the data. In this paper, we approach this challenge through a combination of the Minimum Description Length (MDL) principle, feature selection strategies, and convolution techniques from the signal processing field. As a result, our algorithm produces a good decomposition of a given time series and, as a side effect, builds a compact representation of its identified components. Experiments demonstrate that our method manages to identify correctly both the number and the temporal scale of the components for real-world as well as artificial data and show the usefulness of our method as an exploratory tool for analyzing time series data.

pacific rim international conference on artificial intelligence | 2008

Learning from the Past with Experiment Databases

Joaquin Vanschoren; Bernhard Pfahringer; Geoffrey Holmes

Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, often losing detailed results after publication. Yet, when collecting all these past experiments in experiment databases, they can readily be reused for additional and possibly much broader investigation. In this paper, we make use of such a database to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons of algorithms, we also investigate the effects of algorithm parameters and data properties, and seek deeper insights into the behavior of learning algorithms by studying their learning curves and bias-variance profiles.

Explore More